- Updated: June 10, 2026
- 6 min read
PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

Direct Answer
PEAM (Parametric Embodied Agent Memory) introduces a hybrid memory system for Minecraft agents that converts episodic experience into permanent, parameter‑resident skills. By pairing a slow, reasoning‑heavy large language model (LLM) with a fast, Mixture‑of‑Experts LoRA module, the framework lets agents internalize both successes and failures, dramatically improving long‑horizon performance while avoiding the latency of retrieval‑based memory.
Background: Why This Problem Is Hard
Embodied agents operating in open‑world environments such as Minecraft must remember a vast, ever‑growing set of interactions—crafting recipes, terrain navigation, combat tactics, and social cues. Traditional approaches fall into two camps:
- Retrieval‑based memory: Agents store raw experience in external databases and query them at inference time. This yields high fidelity but incurs costly latency and scales poorly as the corpus expands.
- Parametric memory only: Skills are baked directly into model weights via fine‑tuning. While inference is fast, continual learning quickly leads to catastrophic forgetting, especially when tasks shift.
Both strategies struggle with the “experience‑to‑skill” gap: raw observations are noisy, and the agent must decide which moments are worth solidifying into permanent behavior. Moreover, most systems treat failure as a dead‑end rather than a learning signal, missing an opportunity to teach the agent how to correct itself.
What the Researchers Propose
PEAM proposes a three‑layered architecture that bridges the gap between retrieval and parametric memory:
- Deliberative LLM: A large language model runs slowly but can perform open‑ended reasoning, plan generation, and contextual interpretation of the Minecraft world.
- Fast Mixture‑of‑Experts (MoE) LoRA module: A lightweight, multimodal adapter network composed of category‑specific experts (e.g., building, combat, resource gathering). Each expert lives in its own parameter subspace, preventing interference during continual learning.
- Consolidation engine: A self‑triggered mechanism evaluates experience, scores its “parameter‑worthiness,” and decides when to internalize it into the MoE via contrastive learning.
The key insight is to treat failure–correction trajectories as first‑class training data. By jointly optimizing a behavioral‑cloning loss (to reproduce successful actions) and a contrastive loss (to separate corrected actions from the original failures), the agent learns not only what works but also why the failed attempt was suboptimal.
How It Works in Practice
Conceptual Workflow
- Observation: The agent perceives the Minecraft scene (visual pixels, inventory state, textual prompts).
- Deliberation: The slow LLM generates a high‑level plan (e.g., “build a nether portal”) and proposes an initial action sequence.
- Execution: The MoE LoRA module receives the plan, selects the appropriate expert, and executes reflexive actions in real time.
- Feedback Loop: If an action fails (e.g., the portal frame collapses), the environment returns a failure signal.
- Correction Generation: The LLM re‑examines the failure, produces a corrected trajectory, and tags the pair as a learning instance.
- Consolidation Decision: The parameter‑worthiness scorer evaluates the pair. If the score exceeds a dynamic, scale‑free threshold, the consolidation engine triggers internalization.
- Internalization: The MoE adapters are updated via a joint behavioral‑cloning and contrastive objective, embedding the new skill directly into parameters.
What Sets PEAM Apart
- Physical isolation of experts: Each category’s adapters are stored in separate parameter blocks, allowing continual learning without catastrophic forgetting.
- Self‑triggered consolidation: No hand‑tuned thresholds are needed; the system automatically decides when enough evidence exists to merit internalization.
- Contrastive internalization of failure: By explicitly learning the difference between a failed action and its correction, the agent builds a richer representation of “what not to do.”
Evaluation & Results
The authors benchmarked PEAM on a suite of long‑horizon Minecraft tasks, ranging from multi‑step crafting pipelines to complex exploration missions. Evaluation focused on three axes:
- Task success rate: PEAM achieved a 27 % higher completion rate on tasks requiring more than 50 sequential steps compared to a retrieval‑only baseline.
- Forgetting mitigation: After consolidating new skills, the MoE retained 92 % of previously learned behaviors, whereas a naïve fine‑tuning approach dropped to 68 %.
- Inference efficiency: Because the fast module handles reflexive execution, average latency per action dropped from 1.8 seconds (retrieval) to 0.3 seconds, a six‑fold speedup.
These results demonstrate that PEAM not only scales better with experience but also learns more robustly by leveraging failure as a teaching signal.
Why This Matters for AI Systems and Agents
PEAM’s hybrid memory design offers a blueprint for building agents that can evolve continuously without sacrificing real‑time responsiveness. For practitioners, the framework suggests several practical takeaways:
- Embedding a UBOS platform overview of modular adapters can simplify the deployment of category‑specific skills across domains such as robotics, virtual assistants, or game AI.
- Using contrastive learning on failure–correction pairs can improve safety‑critical systems where understanding “what went wrong” is as important as knowing the correct action.
- The scale‑free consolidation trigger reduces the engineering overhead of hand‑tuning thresholds for each new task, accelerating product iteration cycles.
- Fast MoE execution aligns with low‑latency requirements of edge devices, making PEAM relevant for on‑device AI in AR/VR or autonomous drones.
In short, PEAM bridges the gap between the flexibility of retrieval‑based memory and the speed of parametric models, a balance that many enterprise AI deployments struggle to achieve.
What Comes Next
While PEAM marks a significant step forward, several open challenges remain:
- Cross‑domain transfer: Extending the MoE experts beyond Minecraft to other 3D environments will require domain‑agnostic visual encoders.
- Scalable expert management: As the number of categories grows, routing decisions may become a bottleneck; research into dynamic expert selection is needed.
- Human‑in‑the‑loop correction: Incorporating real‑time feedback from users (e.g., via a Telegram integration on UBOS) could accelerate skill acquisition in production settings.
- Evaluation standards: Developing benchmark suites that measure both skill retention and adaptation speed will help compare future continual‑learning agents.
Future work may also explore integrating PEAM with OpenAI ChatGPT integration to provide natural‑language debugging assistance, or coupling it with Workflow automation studio for automated skill pipelines.
References & Further Reading
Guo, Y., Gong, J., Cai, H., Cheung, Y., & Su, W. (2026). PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft. arXiv preprint.