- Updated: March 11, 2026
- 2 min read
HarmonyCell: Automating Single‑Cell Perturbation Modeling under Semantic and Distribution Shifts
HarmonyCell: Automating Single‑Cell Perturbation Modeling under Semantic and Distribution Shifts

HarmonyCell is a groundbreaking end‑to‑end agent framework that tackles two major sources of heterogeneity in single‑cell perturbation studies: semantic heterogeneity and statistical distribution shifts. By integrating a Large Language Model‑driven Semantic Unifier with an adaptive Monte Carlo Tree Search (MCTS) engine, HarmonyCell delivers fully automated, high‑quality virtual cell models without the need for dataset‑specific engineering.
Key Innovations
- Semantic Unifier: Automatically maps disparate metadata schemas into a unified canonical interface, eliminating manual curation.
- Adaptive MCTS: Searches a hierarchical action space to discover optimal model architectures that respect dataset‑specific distribution shifts.
- Robust Execution: Achieves a 95% valid execution rate on heterogeneous inputs (vs. 0% for generic agents).
- Performance: Matches or exceeds expert‑designed baselines in rigorous out‑of‑distribution evaluations.
Why It Matters
Single‑cell perturbation experiments are essential for understanding cellular responses to genetic or pharmacological interventions. However, the dual heterogeneity bottlenecks have limited scalability and reproducibility. HarmonyCell’s dual‑track orchestration empowers researchers to:
- Rapidly integrate new datasets regardless of their original metadata conventions.
- Automatically adapt model inductive biases to the unique statistical properties of each dataset.
- Scale virtual cell modeling to large consortia without bespoke engineering effort.
Technical Details
The framework consists of two tightly coupled components:
1. Semantic Unifier (LLM‑driven) – Parses dataset metadata, resolves terminology conflicts, and generates a standardized schema that downstream modules consume.
2. Monte Carlo Tree Search Engine – Operates over a hierarchical action space representing possible model architectures, training pipelines, and hyper‑parameter configurations. The MCTS algorithm iteratively expands promising branches, evaluates performance on validation splits, and converges on an optimal configuration tailored to the distribution shift.
Evaluation
HarmonyCell was benchmarked across diverse perturbation tasks involving both semantic and distribution shifts. Results show:
- 95% valid execution on heterogeneous inputs.
- Comparable or superior predictive accuracy to expert‑crafted baselines.
- Significant reduction in manual preprocessing time.
Get Started
Explore the full implementation, tutorials, and API documentation on our internal platform: https://ubos.tech/harmonycell. For integration support, contact the ubos.tech team.
Stay tuned for upcoming releases that will extend HarmonyCell to additional omics modalities and real‑time inference pipelines.