- Updated: March 11, 2026
- 7 min read
MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval
Direct Answer
MED‑COPILOT is an interactive clinical decision‑support system that blends guideline‑driven GraphRAG retrieval with a hybrid semantic‑keyword similar‑patient engine. By grounding large language model (LLM) reasoning in both structured medical guidelines and analogical patient cases, it delivers more accurate, transparent, and evidence‑aware recommendations for clinicians and trainees.
Background: Why This Problem Is Hard
Clinical decision‑making is a synthesis problem at scale. Physicians must weigh a patient’s longitudinal history, current symptoms, and the latest evidence from guidelines such as WHO or NICE. Traditional electronic health record (EHR) tools excel at storing data but fall short when it comes to surfacing the right evidence at the right moment.
Recent advances in LLMs have shown impressive reasoning abilities, yet they suffer from two critical shortcomings in the medical domain:
- Hallucination risk: LLMs can generate plausible‑sounding text that is not anchored in any real source, a dangerous flaw when advising on treatment.
- Context overload: Medical documents—guidelines, lab reports, imaging narratives—often exceed the token windows of even the largest models, forcing truncation or loss of nuance.
Retrieval‑augmented generation (RAG) attempts to mitigate hallucinations by pulling in external documents, but standard RAG pipelines treat all retrieved text as a flat bag of words. They ignore the hierarchical, relational nature of clinical knowledge (e.g., “if a patient has hypertension AND chronic kidney disease, then ACE‑inhibitors are contraindicated”). Moreover, existing similar‑patient retrieval systems typically rely on pure vector similarity, which can miss critical keyword matches (e.g., a rare drug interaction mentioned only in a few notes).
These gaps leave a practical bottleneck: clinicians need a system that can (1) reliably retrieve structured guideline evidence, (2) surface analogous patient cases that reflect real‑world outcomes, and (3) present the combined evidence in a way that the LLM can reason over without hallucinating.
What the Researchers Propose
The authors introduce MED‑COPILOT, a hybrid architecture that unites three core agents:
- Guideline‑Grounded GraphRAG: A knowledge graph built from WHO and NICE guidelines, where nodes represent clinical concepts (diagnoses, interventions, contraindications) and edges encode logical relationships. Community‑level summarization compresses the graph for fast retrieval.
- Similar‑Patient Retrieval Engine: A dual‑mode search that blends dense semantic embeddings with keyword‑based Boolean filters, drawing from a curated 36,000‑case database derived from SOAP‑normalized MIMIC‑IV notes and synthetic Synthea records.
- LLM Reasoning Layer: A state‑of‑the‑art LLM (e.g., GPT‑4‑Turbo) that receives both the guideline subgraph and the top‑k patient analogues, then generates a response while explicitly citing the retrieved evidence.
By feeding the LLM a structured, evidence‑rich context, MED‑COPILOT aims to reduce hallucinations, improve factual fidelity, and make the reasoning process auditable for clinicians.
How It Works in Practice
Conceptual Workflow
- User Prompt: A clinician enters a query (e.g., “Best anticoagulation strategy for a 68‑year‑old with atrial fibrillation and chronic kidney disease”).
- Guideline Retrieval: The system queries the GraphRAG using a semantic match to locate the relevant guideline subgraph. Community‑level summarization returns a concise, relational snippet (e.g., “CKD stage 3–4: avoid direct oral anticoagulants; consider dose‑adjusted warfarin”).
- Similar‑Patient Search: Simultaneously, the hybrid engine searches the patient case database. Dense embeddings capture overall similarity, while keyword filters ensure inclusion of critical terms like “CKD” and “warfarin”. The top‑5 analogues are returned with outcome summaries.
- Evidence Fusion: Both the guideline snippet and patient cases are packaged into a prompt template that tags each source (e.g.,
[GUIDELINE],[PATIENT #1]). Token‑level similarity visualizations are generated for transparency. - LLM Generation: The LLM produces a recommendation, explicitly citing the sources (e.g., “According to NICE guideline NG136 [GUIDELINE] and the outcome of patient #3 [PATIENT #3], a reduced‑dose warfarin regimen is advisable”).
- Interactive Follow‑up: Users can request deeper dives (e.g., “Show the full guideline paragraph”) or explore alternative cases, prompting the system to retrieve additional evidence on demand.
Component Interaction Diagram
While a visual diagram is not rendered here, imagine a three‑node loop: Prompt → Retrieval (GraphRAG + Similar‑Patient) → LLM → Response, with a feedback channel that lets the user request more evidence, causing the loop to iterate.
What Sets This Approach Apart
- Structured Guideline Knowledge: Unlike flat text retrieval, GraphRAG preserves logical relationships, enabling the LLM to reason about conditional statements (“if… then…”).
- Hybrid Similar‑Patient Search: The combination of semantic similarity and keyword constraints captures both latent clinical patterns and explicit safety signals.
- Evidence Transparency: Every generated claim is traceable to a specific node in the guideline graph or a concrete patient record, reducing the black‑box nature of LLM outputs.
- Scalable Database: The 36k‑case repository blends real‑world ICU notes (MIMIC‑IV) with synthetic, fully annotated Synthea records, offering breadth without compromising privacy.
Evaluation & Results
Test Scenarios
The authors benchmarked MED‑COPILOT on two tasks:
- Clinical Note Completion: Given a partially written discharge summary, the system must fill in missing sections (e.g., medication plan) while staying faithful to the patient’s history and relevant guidelines.
- Medical Question Answering (MQA): A set of 500 real‑world clinician questions covering diagnosis, treatment, and drug interactions, sourced from a public medical QA dataset.
Key Findings
- Fidelity Boost: MED‑COPILOT reduced factual errors by 38 % compared to a baseline LLM with standard RAG, as measured by expert clinician review.
- Reasoning Accuracy: On the MQA benchmark, the system achieved a 12‑point lift in exact‑match accuracy over the parametric LLM, primarily due to correct guideline citations.
- Transparency Scores: User studies showed a 27 % increase in trust when participants could view the provenance of each recommendation (graph node or patient case).
- Latency: End‑to‑end response time averaged 2.8 seconds, well within interactive use‑case thresholds, thanks to community‑level summarization and efficient hybrid indexing.
Why the Results Matter
These outcomes demonstrate that integrating structured guideline graphs with analogical patient evidence can materially improve both the correctness and the perceived reliability of LLM‑driven clinical assistants. For AI practitioners, the study provides a concrete blueprint for marrying symbolic knowledge representations with neural retrieval, a hybrid paradigm that may generalize beyond medicine.
Why This Matters for AI Systems and Agents
MED‑COPILOT’s architecture addresses three pain points that frequently surface in enterprise‑grade AI agents:
- Hallucination Mitigation: By anchoring generation in verifiable sources, the system offers a template for building “evidence‑first” agents in finance, law, or any regulated sector.
- Modular Retrieval Pipelines: The separation of guideline graph retrieval and similar‑patient search illustrates how heterogeneous knowledge bases can be orchestrated within a single agent workflow.
- Human‑in‑the‑Loop Transparency: Token‑level similarity visualizations and explicit citations empower end‑users to audit AI decisions, a prerequisite for compliance frameworks such as FDA’s AI/ML Software as a Medical Device (SaMD) guidance.
Developers building multi‑modal assistants can adopt the GraphRAG component via ubos.tech/graphrag, while the hybrid patient‑case engine is available at ubos.tech/similar-patient. These internal resources provide ready‑to‑integrate APIs that mirror the paper’s design, accelerating prototyping of evidence‑grounded agents.
What Comes Next
Despite its promising results, MED‑COPILOT has several open challenges:
- Guideline Currency: Medical guidelines evolve; maintaining an up‑to‑date graph requires automated ingestion pipelines and version control.
- Patient Privacy & Bias: While the synthetic Synthea component mitigates privacy concerns, real‑world case databases may embed demographic biases that need systematic auditing.
- Scalability to Multi‑Specialty Domains: Extending the graph to cover oncology, pediatrics, or mental health will demand larger ontologies and more nuanced relationship modeling.
- Integration with EHR Systems: Seamless, standards‑based (FHIR) connectivity is essential for real‑time bedside use.
Future research could explore:
- Dynamic graph updates driven by continuous guideline monitoring services.
- Federated learning approaches that allow hospitals to contribute anonymized case data without centralizing raw records.
- Cross‑modal retrieval that incorporates imaging reports or genomics alongside textual notes.
For teams interested in experimenting with the next generation of clinical assistants, the demo is live at ubos.tech/demo. The open‑source repository also includes scripts for building custom guideline graphs, enabling rapid adaptation to local protocols.
Call to Action
Ready to see evidence‑aware AI in action? Try the MED‑COPILOT demo now and explore how structured guidelines and real‑world patient analogues can elevate clinical reasoning.
References
For a complete technical description, see the original pre‑print: MED‑COPILOT paper.