- Updated: June 26, 2026
- 7 min read
REBA: A Revealed Belief Automaton Framework for Online Planning in Continuous POMDPs

Direct Answer
The paper introduces REBA (Revealed Belief Automaton), an event‑driven framework that builds a finite symbolic belief automaton online from continuous, noisy observations in partially observable Markov decision processes (POMDPs). By certifying “revelation events” through information‑theoretic gates, REBA enables formal parity‑policy synthesis for ω‑regular specifications without relying on a pre‑computed discrete abstraction, dramatically improving online planning performance.
Background: Why This Problem Is Hard
Continuous‑state POMDPs are the de‑facto model for real‑world robotics, autonomous navigation, and many AI‑driven services where an agent must act under uncertainty. Two intertwined challenges make online planning in this setting notoriously difficult:
- Belief dynamics are continuous. Each observation updates a probability distribution (the belief) over an infinite state space, making exact reasoning computationally intractable.
- Temporal logic goals are symbolic. Specifications expressed as ω‑regular formulas (e.g., “visit region A infinitely often while avoiding B”) require a finite memory structure to track progress, but the belief never naturally collapses into such a structure.
Existing solutions fall into two camps. Direct belief‑space search methods (e.g., Monte Carlo Tree Search, particle filters) operate on raw beliefs but lack a symbolic memory, causing them to lose track of long‑horizon logical progress. Discrete abstraction approaches pre‑compute a finite belief MDP, yet they suffer from two fatal drawbacks:
- They require an offline discretization that is either too coarse (losing fidelity) or too fine (exploding the state space).
- Because the abstraction is static, it cannot adapt to noisy, online observations, making formal guarantees brittle in practice.
Consequently, practitioners have no reliable way to obtain trustworthy symbolic states from continuous observations while still exploiting the expressive power of ω‑regular specifications.
What the Researchers Propose
REBA reframes the problem from “discretize the entire belief space” to “certify moments when the belief reveals enough information to be safely abstracted.” The core ideas are:
- Revealed Belief Automaton (RBA): An event‑driven finite‑state machine whose states correspond to certified belief anchors—points in belief space that can be reliably identified despite observation noise.
- Information‑theoretic gates: Real‑time tests (e.g., mutual information thresholds) that decide whether a new belief is sufficiently close to an existing anchor to be “revealed.” If not, a new anchor is created.
- Incremental topology adaptation: As new anchors appear, REBA incrementally rewires the automaton’s transition graph, preserving consistency with the underlying continuous dynamics.
- Integration with ω‑regular specifications: The certified automaton serves as the symbolic memory required for parity‑policy synthesis, enabling formal guarantees without a pre‑computed abstraction.
In short, REBA lets an agent discover its own symbolic representation on the fly, turning noisy belief updates into a trustworthy finite automaton that can be used for rigorous planning.
How It Works in Practice
Conceptual Workflow
- Observation ingestion: The agent receives a continuous sensor reading and updates its belief using a standard Bayesian filter (particle filter, Kalman filter, etc.).
- Revelation check: The updated belief passes through an information‑theoretic gate. The gate computes a divergence (e.g., KL‑divergence) between the belief and each existing anchor.
- Anchor certification: If the divergence falls below a predefined threshold for an anchor, the belief is “revealed” as belonging to that anchor’s symbolic state. Otherwise, a new anchor is instantiated, and the automaton grows.
- Topology adaptation: When a new anchor appears, REBA updates the transition edges based on the observed dynamics (e.g., which actions led from one anchor to another). This step is incremental, avoiding full recomputation.
- Policy synthesis: The current automaton, together with the ω‑regular specification, is fed into a parity‑policy solver that produces a finite‑memory policy guaranteeing satisfaction of the temporal goal.
- Guided Monte Carlo Tree Search (MCTS): The synthesized policy provides a high‑level guide for MCTS, extending its search horizon beyond the local rollout depth while respecting the formal specification.
Component Interaction Diagram
Each component operates asynchronously but communicates through a shared belief repository:
- Belief Updater → pushes continuous belief updates.
- Gate Module → reads belief, decides revelation, returns anchor ID.
- Automaton Manager → maintains the RBA graph, adds nodes/edges.
- Policy Engine → consumes the RBA and specification, outputs parity policy.
- MCTS Planner → uses the parity policy as a heuristic, explores action space.
What sets REBA apart is that the symbolic memory (the automaton) is built online and continuously validated against the raw belief, eliminating the need for a static abstraction that could become invalid under sensor drift or environmental change.
Evaluation & Results
Testbed Scenarios
The authors benchmarked REBA in two representative domains:
- Patrolling task: An autonomous drone must repeatedly visit a set of waypoints while avoiding no‑fly zones, expressed as an ω‑regular “visit‑infinitely‑often” specification.
- Navigation task: A ground robot navigates a cluttered indoor map with partial observability, required to reach a goal region while satisfying a safety‑liveness formula (“never enter hazardous zones, eventually reach the exit”).
Key Findings
- REBA’s online automaton matched the performance of the best offline‑discretized baselines in terms of specification satisfaction probability.
- When measured against pure belief‑space MCTS, REBA achieved **+17 % to +47 %** higher success rates, reflecting more efficient use of the planning horizon.
- The incremental topology adaptation incurred negligible overhead (< 5 % of total runtime), confirming that the method scales to real‑time operation.
- Robustness tests with added sensor noise showed that REBA’s information‑theoretic gates prevented spurious anchor creation, preserving policy correctness.
These results demonstrate that REBA not only bridges the gap between continuous belief updates and symbolic planning but also delivers tangible performance gains in realistic, noisy environments.
Why This Matters for AI Systems and Agents
For practitioners building autonomous agents—whether in robotics, logistics, or interactive AI—REBA offers a concrete pathway to combine the expressive power of temporal logic with the flexibility of online belief updates. The practical implications include:
- Formal guarantees without offline abstraction. Engineers can now synthesize policies that are provably correct with respect to ω‑regular goals while still reacting to live sensor streams.
- Scalable planning for long‑horizon missions. By feeding a parity policy into MCTS, agents can look beyond the typical rollout depth, reducing the number of replanning cycles needed for complex tasks.
- Robustness to noise and drift. The gate‑based anchor certification filters out spurious belief fluctuations, a critical feature for deployment in real‑world settings where sensors degrade over time.
- Modular integration. REBA’s components (belief updater, gate, automaton manager, policy engine) can be swapped with existing stacks, making it compatible with platforms such as the UBOS platform overview or the Workflow automation studio.
In short, REBA equips AI developers with a toolset that turns continuous uncertainty into a tractable, formally verified planning problem, accelerating the path from prototype to production‑grade autonomous systems.
What Comes Next
While REBA marks a significant step forward, several avenues remain open for research and engineering:
- Adaptive threshold learning. Current gates use fixed information‑theoretic thresholds; learning these thresholds online could further reduce unnecessary anchor proliferation.
- Multi‑agent extensions. Extending REBA to coordinate multiple agents would require joint belief anchoring and distributed automaton synthesis.
- Hardware‑accelerated belief updates. Leveraging GPUs or specialized inference chips could shrink the latency of the belief‑updater, enabling higher‑frequency revelation cycles.
- Integration with large‑language‑model (LLM) planners. Combining REBA’s formal guarantees with LLM‑driven high‑level goal decomposition could yield hybrid planners that are both expressive and provably safe.
Organizations interested in experimenting with REBA can start by prototyping the gate and automaton modules within the Enterprise AI platform by UBOS, then connect to existing Monte Carlo planners. As the community refines the framework, we anticipate broader adoption in domains ranging from autonomous drones to AI‑augmented manufacturing lines.
References
REBA: A Revealed Belief Automaton Framework for Online Planning in Continuous POMDPs (arXiv)