- Updated: March 12, 2026
- 6 min read
CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers
Direct Answer
CT‑Flow introduces an agentic framework that lets large vision‑language models (LVLMs) orchestrate the full, tool‑aware workflow of chest CT interpretation, turning static, single‑pass inference into a dynamic, multi‑step process that can invoke measurement, segmentation, and radiomics tools on demand. This matters because it bridges the gap between research‑grade AI and the real‑world radiology workflow, delivering a 41 % boost in diagnostic accuracy and a 95 % success rate in autonomous tool usage.
Background: Why This Problem Is Hard
Radiologists do not read a CT volume in one glance. They scroll through hundreds of slices, toggle window settings, place region‑of‑interest measurements, run segmentation algorithms, and extract quantitative radiomic features before arriving at a diagnosis. Existing AI solutions for 3D imaging typically follow a “closed‑box” paradigm: a model receives the entire volume, produces a report or answer, and exits. This approach suffers from three fundamental limitations:
- Static reasoning: The model cannot adapt its analysis based on intermediate findings, such as requesting a finer measurement after spotting a suspicious nodule.
- Lack of tool integration: Modern PACS environments expose a rich toolbox (e.g., semi‑automatic segmentation, Hounsfield‑unit measurement). Current LVLMs have no native mechanism to call these tools, forcing developers to bake every possible operation into the model’s weights.
- Scalability to 3D data: Processing a full CT volume in a single forward pass is computationally expensive and often forces down‑sampling, which erodes the fine‑grained detail needed for accurate diagnosis.
These bottlenecks mean that, despite impressive performance on benchmark VQA tasks, LVLMs have struggled to gain traction in clinical settings where the workflow is inherently iterative and tool‑centric.
What the Researchers Propose
The authors present CT‑Flow, an agentic orchestration layer built on the Model Context Protocol (MCP). MCP defines a standardized, bidirectional communication contract that lets an LVLM exchange structured requests and responses with external tool servers. In CT‑Flow, the LVLM acts as a high‑level planner, while specialized tool servers (measurement, segmentation, radiomics) act as executors.
Key components include:
- Planner Agent: A large vision‑language model that interprets the radiologist’s natural‑language query, decomposes it into sub‑tasks, and decides which tool to invoke at each step.
- Tool Context Servers: Lightweight micro‑services exposing a uniform API (via MCP) for operations such as slice extraction, region‑of‑interest measurement, and deep‑learning‑based segmentation.
- Orchestration Engine: A runtime that mediates the message flow, tracks state across steps, and ensures that tool outputs are fed back into the planner’s context for subsequent reasoning.
- CT‑FlowBench: A newly curated instruction‑tuning benchmark that pairs 3D CT volumes with multi‑step, tool‑use instructions, enabling systematic evaluation of both diagnostic accuracy and tool‑invocation success.
How It Works in Practice
The CT‑Flow workflow can be visualized as a loop of three phases: Interpret → Invoke → Integrate. Below is a conceptual step‑by‑step illustration:
- Interpretation: The planner receives a radiologist’s query, e.g., “Identify any ground‑glass opacities and quantify their volume.” Using its multimodal encoder, it parses the request and generates a structured plan: (a) locate candidate regions, (b) run segmentation, (c) compute volume.
- Invocation: For each sub‑task, the planner emits an MCP request to the appropriate tool server. The segmentation server receives the volume coordinates, runs a 3D UNet, and returns a binary mask. The measurement server then consumes the mask to calculate volume in milliliters.
- Integration: The tool outputs are appended to the planner’s context as new visual and textual tokens. The planner re‑evaluates the updated context, decides whether additional steps are needed (e.g., refine segmentation), and finally synthesizes a natural‑language answer or report.
What sets CT‑Flow apart is its open‑loop, tool‑aware design. Rather than hard‑coding every possible analysis path, the system can dynamically compose new pipelines at inference time, leveraging any MCP‑compliant tool that a hospital already has deployed. This modularity also reduces the computational burden on the LVLM, which only performs high‑level reasoning while delegating heavy image processing to specialized services.
Evaluation & Results
To validate the framework, the authors conducted two complementary experiments:
CT‑FlowBench Performance
- Task set: 1,200 instruction‑tuned queries covering diagnosis, measurement, and segmentation across diverse thoracic pathologies.
- Metrics: Diagnostic accuracy (clinical correctness), tool‑invocation success rate, and end‑to‑end latency.
- Findings: CT‑Flow achieved a 41 % relative improvement in diagnostic accuracy over the strongest static LVLM baseline, while successfully invoking the correct tool in 95 % of cases. Average latency remained within clinically acceptable limits (≈3.2 seconds per query).
Standard 3D VQA Datasets
- Datasets: 3D VQA‑Chest and a public COVID‑19 CT question set.
- Baseline comparison: Static LVLMs, a multi‑stage CNN pipeline, and a human radiologist cohort.
- Results: CT‑Flow matched or exceeded human performance on 78 % of questions and outperformed all AI baselines by a margin of 12–18 % in answer correctness.
These results demonstrate that the agentic, tool‑aware paradigm not only boosts raw accuracy but also yields more reliable, interpretable workflows that align with how radiologists actually operate.
Why This Matters for AI Systems and Agents
CT‑Flow’s success has several practical implications for AI practitioners building autonomous agents in high‑stakes domains:
- Modular orchestration: By decoupling reasoning from execution, developers can reuse existing toolchains (e.g., DICOM viewers, segmentation libraries) without retraining massive models.
- Improved evaluation: The CT‑FlowBench benchmark provides a concrete yardstick for measuring both reasoning quality and tool‑use proficiency, encouraging the community to adopt more holistic metrics.
- Scalable deployment: The MCP interface is language‑agnostic and can be wrapped around any micro‑service, making it straightforward to integrate with cloud‑native radiology platforms.
- Safety and auditability: Each tool invocation is logged as a discrete, inspectable event, offering a transparent audit trail that regulators and clinicians can review.
For teams building agentic AI, CT‑Flow serves as a reference architecture that demonstrates how to move beyond “one‑shot” inference toward truly interactive, tool‑driven intelligence.
Explore more about building agentic pipelines at ubos.tech/agents.
What Comes Next
While CT‑Flow marks a significant step forward, several challenges remain:
- Generalization to other modalities: Extending MCP to MRI, PET, or ultrasound will require modality‑specific tool servers and possibly new planning heuristics.
- Robustness to noisy inputs: Real‑world PACS data can contain artifacts, missing slices, or inconsistent metadata. Future work should incorporate uncertainty estimation into the planner’s decision‑making.
- Human‑in‑the‑loop refinement: Allowing radiologists to intervene, correct, or re‑prioritize tool calls could further improve trust and performance.
- Regulatory pathways: Demonstrating clinical safety at scale will involve prospective trials and alignment with FDA/EMA guidelines for AI‑enabled medical devices.
Potential applications beyond radiology include pathology slide analysis, surgical navigation, and any domain where experts rely on a suite of specialized tools to interpret high‑dimensional data.
For a deeper dive into orchestration frameworks that can power such extensions, visit ubos.tech/orchestration. To learn how benchmark suites like CT‑FlowBench are shaping future research, see ubos.tech/benchmarks.
References
Original research: CT‑Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers