- Updated: January 30, 2026
- 6 min read
STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification
Direct Answer
The STELLAR framework introduces a structure‑guided pipeline that automatically extracts hardware design fingerprints, retrieves relevant SystemVerilog Assertion (SVA) templates from a curated knowledge base, and uses large language models (LLMs) to generate precise, context‑aware assertions for formal verification. By aligning the syntactic structure of RTL code with LLM prompting, STELLAR dramatically reduces manual effort while improving the relevance and correctness of generated SVAs, accelerating verification cycles for complex silicon projects.
Background: Why This Problem Is Hard
Formal verification remains a cornerstone of hardware reliability, yet its adoption is hampered by the labor‑intensive creation of SystemVerilog Assertions. Engineers must understand intricate timing relationships, signal dependencies, and design intent to write assertions that are both sound and comprehensive. Traditional approaches rely on manual authoring or rule‑based generators that struggle to capture nuanced design semantics, leading to incomplete coverage or false positives.
Recent attempts to harness LLMs for SVA generation promise automation, but they encounter two fundamental obstacles:
- Contextual Blindness: Generic LLM prompts lack awareness of the specific structural hierarchy of RTL code, causing generated assertions to miss critical paths or reference nonexistent signals.
- Knowledge Gap: Off‑the‑shelf LLMs are trained on broad corpora and do not contain a curated repository of verified RTL‑SVA pairs, resulting in low precision and the need for extensive post‑generation editing.
These limitations translate into verification bottlenecks, longer time‑to‑market, and higher risk of silicon bugs slipping through.
What the Researchers Propose
STELLAR (Structure‑guided LLM Assertion Retrieval and Generation) tackles the problem by marrying three complementary components:
- Structural Fingerprint Extraction: A lightweight static analysis pass over the RTL design produces a concise, hierarchical fingerprint that captures module interfaces, signal naming patterns, and data‑flow relationships.
- Knowledge‑Base Retrieval: The fingerprint is used as a query key to fetch the most relevant pre‑validated RTL‑SVA pairs from a domain‑specific repository built from open‑source and proprietary verification assets.
- Structure‑Guided Prompt Engineering: Retrieved examples are embedded into a dynamic prompt that conditions the LLM to generate new assertions aligned with the target design’s topology.
By feeding the LLM with concrete, design‑specific evidence rather than a generic description, STELLAR ensures that the generated SVAs respect the actual hardware structure and adhere to proven verification patterns.
How It Works in Practice
Conceptual Workflow
The end‑to‑end process can be visualized as a four‑stage pipeline:
| Stage | Input | Output | Key Action |
|---|---|---|---|
| 1. Fingerprint Generation | RTL source files (Verilog/SystemVerilog) | Structural fingerprint (JSON) | Parse module hierarchy, extract port lists, signal types, and connectivity patterns. |
| 2. Retrieval Engine | Fingerprint + Knowledge base index | Top‑K matching RTL‑SVA pairs | Similarity search using graph‑based metrics to rank relevance. |
| 3. Prompt Construction | Retrieved pairs + target fingerprint | LLM‑ready prompt | Inject examples, annotate with design‑specific placeholders, and specify assertion intent. |
| 4. LLM Generation & Post‑Processing | Prompt + LLM (e.g., GPT‑4, Claude) | Generated SVAs | Validate syntax, resolve signal names, and optionally run a quick formal check for trivial contradictions. |
Component Interactions
Each component communicates through well‑defined JSON contracts, enabling seamless integration into existing verification toolchains such as JasperGold or Cadence Incisive. The fingerprint module can be invoked as a pre‑compile step, while the retrieval engine leverages an Elasticsearch‑backed vector store for fast similarity look‑ups. Prompt construction is handled by a lightweight templating engine that respects user‑defined style guides (e.g., naming conventions, assertion categories).
What Sets STELLAR Apart
- Design‑Aware Retrieval: Instead of feeding the LLM raw code, STELLAR supplies concrete, high‑quality examples that share structural motifs with the target design.
- Reduced Hallucination: The grounding effect of retrieved pairs dramatically lowers the incidence of spurious signals or ill‑formed temporal operators.
- Scalable Knowledge Base: New RTL‑SVA pairs can be added incrementally, allowing the system to evolve with the organization’s verification assets.
Evaluation & Results
STELLAR was benchmarked on three open‑source hardware suites (RocketChip, OpenRISC, and a RISC‑V core) and compared against two baselines: (1) a naïve LLM prompt that only receives raw RTL, and (2) a rule‑based SVA generator.
Test Scenarios
- Coverage Depth: Percentage of design properties (e.g., reset behavior, data‑path invariants) captured by generated assertions.
- False Positive Rate: Instances where an assertion incorrectly flags a correct design behavior.
- Engineer Effort: Manual editing time required after generation.
Key Findings
- STELLAR achieved an average coverage increase of 42 % over the naïve LLM baseline and 68 % over rule‑based generators.
- False positives dropped from 15 % (naïve LLM) to under 3 % with STELLAR, matching the rate of hand‑crafted assertions.
- Post‑generation editing time was reduced by a factor of 4, translating to roughly 2 hours saved per 1,000 lines of RTL.
These results demonstrate that structural grounding not only improves the relevance of generated SVAs but also makes the workflow practical for large‑scale designs where manual assertion writing is a major cost driver.
Why This Matters for AI Systems and Agents
For verification engineers and AI‑driven tool developers, STELLAR offers a concrete pathway to embed domain expertise into generative models without sacrificing flexibility. The framework illustrates how a hybrid approach—combining symbolic analysis (fingerprints) with probabilistic generation (LLMs)—can overcome the “knowledge gap” that typically plagues AI assistants in highly specialized engineering domains.
Practical implications include:
- Accelerated Verification Pipelines: Early‑stage assertion scaffolding enables faster formal runs, catching bugs before simulation.
- Standardization Across Teams: By pulling from a shared knowledge base, organizations can enforce consistent assertion styles and coverage goals.
- Enhanced AI Agent Design: STELLAR’s pattern of “retrieve‑then‑generate” can be generalized to other hardware‑centric agents, such as automated test‑bench synthesis or RTL refactoring assistants.
What Comes Next
While STELLAR marks a significant step forward, several avenues remain open for exploration:
- Dynamic Context Integration: Extending fingerprints to capture runtime simulation traces could enable assertions that adapt to observed corner cases.
- Multi‑Model Ensembles: Combining multiple LLMs with complementary strengths (e.g., code‑centric vs. reasoning‑centric) may further boost generation quality.
- Cross‑Domain Knowledge Transfer: Applying the retrieve‑then‑generate paradigm to other EDA tasks such as power‑aware synthesis or timing closure.
- Human‑in‑the‑Loop Feedback Loops: Incorporating engineer corrections back into the knowledge base to continuously refine retrieval relevance.
Organizations interested in adopting STELLAR can start by integrating the fingerprint extraction module into their existing build scripts and populating a private RTL‑SVA repository. Over time, the system will learn from the organization’s verification history, delivering increasingly precise assertions.
For a deeper dive into the methodology and experimental details, consult the original paper.
