Updated: June 10, 2026
8 min read

A Query Engine for the Agents

Direct Answer

The paper introduces Hyperparam, a lightweight JavaScript‑native query engine that lets AI agents run SQL‑style analytics directly on unstructured text stored in Parquet or Iceberg files, without leaving the client‑side runtime. By embedding model‑driven user‑defined functions (UDFs) at the cell level, Hyperparam makes it possible to ask “why did the agent stumble?” and receive model‑interpreted answers in real time, dramatically lowering latency and cost for next‑generation AI‑centric applications.

Background: Why This Problem Is Hard

Modern AI products—Claude Desktop, Cursor, in‑browser assistants—are built around a tight loop between a human user, a large language model (LLM), and streams of textual artifacts: chat logs, reasoning chains, tool outputs, and agent traces. These artifacts are the fastest‑growing data class in production, yet they remain stubbornly unstructured. Traditional relational databases excel at numeric, schema‑driven queries, but they cannot natively interpret free‑form text without an LLM in the query path.

Engineers have tried two workarounds:

Pre‑processing pipelines that extract embeddings or tags and store them in a separate index. This adds latency, doubles storage, and forces developers to guess which signals will be useful later.
Server‑side analytics using Spark, Trino, or managed warehouses. These systems are heavyweight, require JVM runtimes, and cannot be shipped inside a browser tab or a per‑turn sandbox where many agents now execute.

The core bottleneck is the “read path”: a client‑side JavaScript environment must retrieve columnar data from object storage, decode it, and optionally invoke an LLM to interpret each cell. Existing JS‑based engines are either too large to fit in a cold tab (often > 5 MB) or lack async per‑cell execution, causing every cell to be materialized even when downstream operators only need a few rows. The result is high latency, excessive bandwidth, and prohibitive compute costs for real‑time agent analytics.

What the Researchers Propose

Hyperparam is presented as a three‑library stack that together satisfies the three first‑order properties required by AI‑native client applications:

JS‑native distribution: All three libraries are pure JavaScript/TypeScript, compiled to a single bundle under 70 KB, making them suitable for cold‑load scenarios in browsers or sandboxed runtimes.
Tiny footprint: The combined size is small enough to ship alongside the LLM model itself, avoiding additional network round‑trips.
Model‑aware execution: By exposing an async‑native UDF interface, developers can attach LLM‑shaped functions to individual cells. The engine only invokes these functions when downstream operators (filters, sorts, joins) actually need the cell’s interpreted value.

The three libraries are:

Hyparquet: A Parquet reader that streams column chunks directly from object storage, decoding them on the fly without loading the entire file into memory.
Squirreling: The query planner and executor that supports per‑cell async UDFs, enabling “lazy” model evaluation. It also implements a vectorized filter pipeline that can short‑circuit expensive LLM calls.
Icebird: An Iceberg table catalog that resolves snapshot metadata, partition pruning, and schema evolution, all within the same JS runtime.

Collectively, these components let an AI agent ask natural‑language questions about its own trace data, such as “show me the step where the reasoning chain diverged,” and receive answers that blend SQL precision with LLM interpretation.

How It Works in Practice

At a high level, the workflow proceeds through four stages:

1. Data Ingestion

Hyparquet opens a Parquet file stored in an S3‑compatible bucket. It reads the file footer to discover column offsets, then streams only the columns referenced by the upcoming query. Icebird resolves the correct Iceberg snapshot, applying partition filters to avoid unnecessary I/O.

2. Query Planning

Squirreling parses a SQL‑like statement (e.g., SELECT step, reason FROM traces WHERE confusion_score > 0.7) and builds a logical plan. The planner identifies which columns will be needed for predicates, projections, and ordering.

3. Lazy Model Evaluation

When the plan reaches a predicate that depends on a textual column, Squirreling registers an async UDF that calls the LLM (via an HTTP endpoint or a local inference runtime). The engine evaluates the predicate row‑by‑row; if a row fails early, the expensive LLM call for that cell is never made. This “per‑cell, async‑native” execution is the key differentiator from bulk‑oriented engines like DuckDB‑WASM.

4. Result Materialization

Only rows that survive all filters are materialized into the final result set, which can be streamed back to the UI or fed into downstream agent logic. Because each LLM call is awaited asynchronously, the overall latency scales with the number of required interpretations, not the total number of rows in the source file.

The entire pipeline runs inside the same JavaScript event loop that hosts the LLM, meaning no context switches, no extra containers, and no additional network hops beyond the initial object‑storage fetch.

AI query engine diagram

Evaluation & Results

The authors benchmarked Hyperparam against DuckDB‑WASM, a popular in‑browser analytical engine, across two query families:

Filter‑bounded queries: Queries that apply a selective predicate before any sorting or aggregation.
Sort‑bounded queries: Queries that require a full sort before limiting the result set.

Key findings include:

Speed: Squirreling executed filter‑bounded queries up to 300× faster than DuckDB‑WASM, and sort‑bounded queries up to 192× faster. The gains stem from lazy LLM evaluation and column‑pruned streaming.
Cost: In a simulated ten‑task “agent analyst suite,” Hyperparam reduced compute spend by roughly two‑thirds compared to a baseline that eagerly evaluated all cells.
Scalability: The engine successfully processed Parquet files up to 50 GB in size while keeping memory usage under 200 MB, demonstrating suitability for client‑side workloads on modern browsers and edge devices.

These results validate the hypothesis that a JS‑native, async‑aware query engine can deliver both performance and cost benefits for AI‑centric analytics, something that traditional SQL engines or bulk‑processing frameworks cannot achieve.

Why This Matters for AI Systems and Agents

Hyperparam reshapes the engineering trade‑offs for any product that couples an LLM with user‑generated text data:

Real‑time introspection: Agents can now query their own logs on the fly, enabling features like “debug mode” or “self‑explain” without sending raw logs to a remote server.
Reduced latency: By keeping the query engine in the same JavaScript runtime as the LLM, round‑trip latency drops from seconds to milliseconds, improving interactive experiences.
Lower operational overhead: No need to provision Spark clusters or manage external warehouses; developers can ship a single 70 KB bundle alongside the model.
Cost efficiency: Lazy LLM calls mean you only pay for model inference when it truly adds value, aligning spend with business outcomes.

For teams building AI‑driven workflows, Hyperparam opens the door to new product categories—automated compliance auditors, AI‑enhanced BI dashboards, and self‑optimizing agents—that previously required heavyweight back‑ends.

Explore how the UBOS platform overview can integrate Hyperparam‑style query capabilities into your existing AI stack.

Leverage the Workflow automation studio to orchestrate multi‑step agent pipelines that include on‑the‑fly data interrogation.

Discover use cases for AI marketing agents that can analyze campaign logs in real time and adjust spend without leaving the browser.

Large enterprises can benefit from the Enterprise AI platform by UBOS, which now supports client‑side query engines for secure, on‑prem analytics.

Startups looking for rapid prototyping can try the UBOS for startups offering, which includes pre‑configured Hyperparam modules.

What Comes Next

While Hyperparam demonstrates impressive gains, several open challenges remain:

Model heterogeneity: Current async UDFs assume a single LLM endpoint. Supporting a mix of open‑source and proprietary models with differing latency profiles will require adaptive scheduling.
Security and privacy: Running LLM inference client‑side raises concerns about model leakage and data exfiltration. Future work should explore sandboxed inference environments and encrypted object‑storage access.
Query language extensions: Adding native support for vector similarity search, temporal joins, and probabilistic predicates could broaden applicability beyond text‑centric workloads.
Distributed execution: For edge‑to‑cloud scenarios, a lightweight coordination layer could allow multiple browser instances to share cached column chunks, reducing duplicate I/O.

Potential next steps for practitioners include:

Evaluating UBOS pricing plans that bundle Hyperparam with managed LLM inference, simplifying deployment.
Using UBOS templates for quick start to spin up a demo agent that queries its own reasoning chain.
Experimenting with Openclaw (Clawdbot, MoltBot) as a testbed for multi‑agent collaboration powered by client‑side analytics.
Integrating with Ollama to run local LLMs that feed Hyperparam’s async UDFs, achieving fully offline operation.
Connecting agents to messaging platforms via the Telegram integration on UBOS to surface query results directly in chat.

As AI agents become ubiquitous—from personal assistants to enterprise orchestration layers—the need for a fast, lightweight, model‑aware query engine will only intensify. Hyperparam’s design philosophy—bringing the data lake to the client and letting the model decide what to compute—offers a blueprint for the next generation of AI‑native data engineering.

For a deeper dive into the methodology and benchmark details, consult the original arXiv paper.

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

A Query Engine for the Agents

Direct Answer

Background: Why This Problem Is Hard

What the Researchers Propose

How It Works in Practice

1. Data Ingestion

2. Query Planning

3. Lazy Model Evaluation

4. Result Materialization

Evaluation & Results

Why This Matters for AI Systems and Agents

What Comes Next

Carlos

AI Chatbot Starter Kit v0.1

Image Generation with Stable Diffusion

Your Speaking Avatar

Talk with Claude 3

Customer Relationship Management (CRM)

Pharmacy Admin Panel

Sign up for our newsletter

Direct Answer

Background: Why This Problem Is Hard

What the Researchers Propose

How It Works in Practice

1. Data Ingestion

2. Query Planning

3. Lazy Model Evaluation

4. Result Materialization

Evaluation & Results

Why This Matters for AI Systems and Agents

What Comes Next

Share

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password