- Updated: January 30, 2026
- 7 min read
Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under‑Specified Reasoning
Direct Answer
The paper “Teaching LLMs to Ask: Self‑Querying Category‑Theoretic Planning for Under‑Specified Reasoning” introduces Self‑Querying Bidirectional Categorical Planning (SQ‑BCP), a framework that equips large language models (LLMs) with the ability to generate and answer their own clarifying questions, enabling reliable planning even when the problem description is incomplete or ambiguous. By grounding the reasoning process in category‑theoretic constructs, SQ‑BCP dramatically improves the consistency and safety of LLM‑driven agents operating under partial observability.
Background: Why This Problem Is Hard
Modern AI agents often rely on LLMs to translate natural‑language goals into executable plans. In real‑world deployments—such as personal assistants, autonomous bots, or workflow orchestrators—the input specifications are rarely exhaustive. Missing preconditions, hidden constraints, or ambiguous user intents create a classic partial observability problem that can cause agents to take unsafe actions, generate nonsensical steps, or simply fail.
Existing approaches address this gap in three main ways:
- Prompt engineering: Hand‑crafted prompts attempt to coax the model into asking clarifying questions, but they are brittle and do not scale across domains.
- External verification: Separate modules (e.g., rule‑based validators) check plan feasibility after generation, often catching errors too late to recover gracefully.
- Reinforcement learning from human feedback (RLHF): While RLHF can improve alignment, it still depends on the model recognizing that a query is needed, which is not guaranteed in under‑specified contexts.
These methods share a critical limitation: they treat question generation as an afterthought rather than an integral part of the planning loop. Consequently, agents lack a systematic way to reason about what they do not know, leading to brittle behavior in safety‑critical applications.
What the Researchers Propose
SQ‑BCP reframes planning as a self‑querying categorical process. At a high level, the framework consists of three cooperating components:
- Self‑Query Generator: An LLM module that, given a partially specified goal, formulates targeted clarification questions aimed at resolving unknown preconditions.
- Bidirectional Search Engine: A planner that simultaneously expands forward from the initial state and backward from the goal, using the answers to the generated queries to prune infeasible branches.
- Categorical Verifier: A lightweight, mathematically grounded verifier that classifies each state‑transition pair into one of three categories—Satisfied (Sat), Violated (Viol), or Unknown (Unk)—based on category‑theoretic morphisms.
By iterating between these components, SQ‑BCP transforms the planning problem into a dialogue between the model and an external knowledge source, ensuring that every step is justified before execution.
How It Works in Practice
Conceptual Workflow
- Goal Ingestion: The user supplies a high‑level instruction (e.g., “prepare a vegan lasagna”). The system parses this into an initial symbolic goal representation.
- Initial Query Generation: The Self‑Query Generator scans the goal for missing preconditions (ingredients, equipment, dietary restrictions) and emits a set of clarifying questions.
- Answer Acquisition: Answers are retrieved from a knowledge base, external APIs, or the user themselves. Each answer is fed back into the planner.
- Bidirectional Expansion: The planner grows a forward tree from the known initial state and a backward tree from the clarified goal, using the answers to fill previously unknown slots.
- Categorical Verification: For every newly created edge, the verifier checks whether the transition satisfies the categorical constraints. Edges classified as Viol are discarded; Sat edges are kept; Unk edges trigger another round of self‑querying.
- Plan Synthesis: Once a consistent forward‑backward path is found, the system extracts a linearized plan and presents it to the user or downstream executor.
Interaction Between Components
The three components form a closed feedback loop:
- The Self‑Query Generator is conditioned on the current Unk states identified by the verifier.
- The Bidirectional Search Engine uses answers to prune the search space, dramatically reducing combinatorial explosion.
- The Categorical Verifier provides a mathematically sound guarantee that each transition respects the underlying domain ontology, expressed as categorical morphisms.
This loop continues until no Unk states remain, at which point the plan is considered both feasible and safe.
What Makes This Approach Different
- Self‑Querying as Core Logic: Unlike ad‑hoc prompting, SQ‑BCP treats question generation as a first‑class operation, driven by formal uncertainty detection.
- Bidirectional Search: Simultaneous forward and backward expansion leverages goal information early, cutting down on unnecessary exploration.
- Category‑Theoretic Guarantees: By modeling states and actions as objects and morphisms, the verifier offers a principled way to reason about partial knowledge, something most neural planners lack.
Evaluation & Results
Testbeds and Scenarios
The authors evaluated SQ‑BCP on two publicly available corpora that exemplify under‑specified reasoning:
- WikiHow: A collection of procedural articles where steps often omit prerequisites (e.g., “bake a cake” without listing required ingredients).
- RecipeNLG: A dataset of natural‑language recipes that frequently assume culinary knowledge not explicitly stated.
Both datasets were transformed into planning problems where the model had to generate a complete, executable sequence of actions given only the headline instruction.
Key Findings
| Metric | Baseline (Prompt‑Only) | SQ‑BCP |
|---|---|---|
| Plan Completion Rate | 62 % | 89 % |
| Safety Violations (post‑hoc check) | 18 % | 3 % |
| Average Queries per Task | — | 2.4 |
| Human Preference Score | 3.1 / 5 | 4.6 / 5 |
These results demonstrate that SQ‑BCP not only produces more complete plans but also reduces unsafe or contradictory steps by an order of magnitude. Importantly, the modest number of generated queries (≈2 per task) shows that the system can resolve uncertainty efficiently without overwhelming the user.
Why the Findings Matter
From a practitioner’s perspective, the evaluation confirms three actionable insights:
- Embedding self‑querying directly into the planning loop yields tangible safety gains.
- Bidirectional search combined with categorical verification scales to realistic, noisy instruction sets.
- The approach remains lightweight enough to be integrated into existing LLM‑based pipelines without prohibitive computational overhead.
Why This Matters for AI Systems and Agents
For developers building trustworthy AI agents—whether for customer support, autonomous robotics, or workflow automation—the ability to ask the right question at the right time is a cornerstone of reliability. SQ‑BCP offers a concrete recipe for embedding that capability:
- Improved Agent Robustness: By systematically eliminating unknowns before execution, agents avoid costly failures in production environments.
- Enhanced Safety Guarantees: The categorical verifier provides a formal safety net that can be audited and extended to domain‑specific constraints.
- Better Human‑AI Collaboration: Users receive concise, targeted clarification prompts instead of being forced to anticipate every missing detail themselves.
Organizations looking to operationalize LLMs can adopt SQ‑BCP as a modular layer. For example, the UBOS agent framework can integrate the Self‑Query Generator as a pre‑processor, while the planning service can host the bidirectional engine and categorical verifier.
What Comes Next
Current Limitations
Despite its promise, SQ‑BCP has several open challenges:
- Domain Generalization: The categorical schemas used in the verifier were handcrafted for procedural tasks; extending them to domains like finance or healthcare will require richer ontologies.
- Query Cost: While the average query count is low, certain complex goals may trigger longer dialogue loops, impacting latency.
- Knowledge Source Dependence: The quality of answers hinges on external knowledge bases; noisy or outdated sources can re‑introduce uncertainty.
Future Research Directions
Potential avenues to address these gaps include:
- Learning categorical schemas automatically from large corpora, reducing manual engineering effort.
- Integrating retrieval‑augmented generation (RAG) pipelines to provide up‑to‑date answers for self‑queries.
- Exploring multi‑agent extensions where several specialized LLMs collaborate on different sub‑queries.
Potential Applications
Beyond the evaluated datasets, SQ‑BCP could empower a range of emerging AI products:
- Intelligent Process Automation: Automating enterprise SOPs where policy details are often implicit.
- Robotic Task Planning: Enabling robots to request missing environmental information before acting.
- Regulatory Compliance Assistants: Prompting users for missing compliance documentation in real time.
Developers interested in prototyping these ideas can explore the trustworthy AI toolkit for ready‑made components that align with the SQ‑BCP philosophy.
References
For a complete technical description, see the original arXiv paper.
