- Updated: June 25, 2026
- 7 min read
AutoACSL: Synthesizing ACSL Specifications by Integrating LLMs with CPG-Based Static Analysis
Direct Answer
AutoACSL is a hybrid framework that automatically generates ACSL (ANSI/ISO C Specification Language) contracts for C programs by combining large language model (LLM) prompting with semantic information extracted from Code Property Graphs (CPGs). It matters because it bridges the gap between AI‑driven code synthesis and the rigorous, machine‑checkable specifications required for formal verification, dramatically reducing manual effort and expertise barriers.
Background: Why This Problem Is Hard
Formal verification of safety‑critical C code relies on precise contracts written in ACSL. These contracts describe pre‑conditions, post‑conditions, and invariants that static analyzers such as Frama‑C use to prove correctness. In practice, writing ACSL specifications is labor‑intensive, error‑prone, and demands deep knowledge of both the codebase and the verification toolchain. Consequently, many projects abandon formal methods despite their proven benefits.
Recent attempts to automate specification synthesis have turned to large language models because of their impressive natural‑language understanding and code generation capabilities. However, LLM‑only approaches typically treat source code as plain text, ignoring the rich control‑flow, data‑flow, and type information that static analysis can provide. The result is often a contract that looks plausible but fails to compile, is semantically inconsistent, or cannot be discharged by a verifier.
Thus, the core difficulty lies in reconciling two divergent strengths: the creative, pattern‑recognizing power of LLMs and the exact, graph‑based semantics that static analysis tools extract from compiled code. A solution must preserve the LLM’s ability to generate natural‑language specifications while grounding those specifications in the concrete program behavior captured by a CPG.
What the Researchers Propose
AutoACSL introduces a three‑stage pipeline that tightly couples LLM prompting with CPG‑derived semantic features. The framework consists of:
- Semantic Extraction Layer: A static analysis engine builds a Code Property Graph for each target function, exposing nodes such as variable declarations, control‑flow branches, and pointer aliasing relationships.
- LLM Prompt Engine: The extracted graph attributes are transformed into a structured prompt that guides a large language model to produce ACSL clauses aligned with the observed semantics.
- Feedback Loop: The generated contract is fed back into the verifier (Frama‑C/WP). If the proof fails, the system automatically refines the prompt with counterexample information and iterates until the contract is either verified or a predefined timeout is reached.
By treating the CPG as a “semantic scaffold,” AutoACSL ensures that the LLM’s output is not only syntactically correct but also semantically anchored to the program’s actual behavior.
How It Works in Practice
Conceptual Workflow
- Parse and Graph Build: The source file is parsed by a CPG generator (e.g., Joern). The resulting graph captures data‑flow edges, control‑flow paths, and type information.
- Feature Selection: From the graph, AutoACSL extracts a concise set of features—such as loop bounds, pointer dereference conditions, and function call signatures—that are most relevant for specification.
- Prompt Construction: These features are embedded into a templated prompt that includes natural‑language instructions (e.g., “Write a pre‑condition that ensures the pointer `buf` is non‑null and points to a buffer of at least `len` bytes”).
- LLM Generation: A large language model (e.g., GPT‑4, Claude) receives the prompt and returns a candidate ACSL contract.
- Verification Attempt: The candidate contract is passed to Frama‑C/WP. If the proof succeeds, the contract is accepted; otherwise, the verifier returns a counterexample.
- Iterative Refinement: The counterexample is translated back into graph features (e.g., “the loop may iterate more than the guessed bound”) and appended to the prompt for a second generation pass.
Interaction Between Components
The static analysis module and the LLM operate as complementary agents. The analysis module supplies deterministic, tool‑derived facts, while the LLM contributes creative reasoning to express those facts in ACSL syntax. The feedback loop acts as a negotiation protocol: each failed proof informs the next prompt, gradually converging on a contract that satisfies both the verifier’s logical constraints and the programmer’s intent.
What Makes This Approach Different
- Semantic Grounding: Unlike pure‑LLM methods, AutoACSL never guesses; every clause is traceable to a concrete graph node.
- Proof‑Guided Iteration: The verifier’s counterexamples serve as a precise error signal, enabling targeted prompt adjustments rather than blind regeneration.
- Toolchain Compatibility: The framework plugs directly into existing Frama‑C workflows, allowing teams to adopt it without overhauling their verification pipelines.
Evaluation & Results
Experimental Setup
The authors assembled a benchmark of 604 open‑source C functions drawn from embedded systems, cryptographic libraries, and classic algorithm implementations. Each function was manually annotated with a gold‑standard ACSL contract to serve as a reference.
AutoACSL was evaluated against two baselines:
- LLM‑Only: Prompting the same language model with the raw source code but without CPG features.
- Static‑Only: Generating contracts solely from heuristic static analysis (no LLM).
Key Findings
- Specification Coverage: AutoACSL produced syntactically valid ACSL contracts for 92 % of the functions, compared with 68 % for the LLM‑Only baseline.
- Proof Success Rate: When fed to Frama‑C/WP, 74 % of AutoACSL’s contracts were fully discharged (i.e., the verifier proved all assertions), versus 31 % for LLM‑Only and 45 % for Static‑Only.
- Iteration Efficiency: The average number of refinement loops per function was 1.8, indicating that most contracts converged after one or two feedback cycles.
- Human Effort Reduction: Manual annotation time dropped from an average of 12 minutes per function to under 2 minutes when engineers used AutoACSL as a starting point.
These results demonstrate that integrating semantic graph data with LLM prompting yields contracts that are both more complete and more verifiable than either technique alone.
Why This Matters for AI Systems and Agents
AutoACSL’s ability to synthesize machine‑checkable specifications has immediate implications for AI‑driven development pipelines. Agents that automatically generate, test, and deploy code can now anchor their output in formal contracts, reducing the risk of silent bugs in safety‑critical domains such as automotive firmware, medical devices, and aerospace control software.
Moreover, the feedback‑driven loop exemplifies a broader pattern for AI‑agent orchestration: combine deterministic analysis (the “verifier”) with generative reasoning (the “LLM”) and let the verifier’s failures steer the next generation step. This paradigm can be reused for tasks beyond specification, such as automated test‑case generation, security policy synthesis, or model‑based reinforcement learning where formal guarantees are required.
Enterprises looking to embed formal methods into their CI/CD pipelines can leverage AutoACSL as a plug‑in component, turning a traditionally manual verification step into an automated, AI‑augmented service. For teams already using the Enterprise AI platform by UBOS, AutoACSL could be wrapped as a micro‑service that receives source snippets, returns ACSL contracts, and feeds the results back into existing static analysis dashboards.
What Comes Next
While AutoACSL marks a significant advance, several open challenges remain:
- Scalability to Large Codebases: Current experiments focus on isolated functions. Extending the approach to whole‑program verification will require hierarchical graph summarization and cross‑module prompt coordination.
- Model Generalization: The framework relies on proprietary LLMs; exploring open‑source alternatives (e.g., LLaMA, Ollama) could democratize access and reduce licensing constraints.
- Rich Specification Languages: ACSL is powerful but limited to C. Adapting the pipeline to other annotation languages (e.g., VeriFast for Java, Why3 for OCaml) would broaden its applicability.
- Human‑in‑the‑Loop Interfaces: Providing engineers with visual diff tools that highlight how each refinement step changes the contract could improve trust and adoption.
Future research may also explore tighter integration with AI‑orchestrated testing frameworks, where generated contracts automatically seed property‑based tests that further validate the code before formal proof attempts.
For organizations eager to experiment with AI‑augmented verification, the UBOS platform overview offers a low‑friction environment to prototype such pipelines, combine them with existing CI tools, and monitor verification outcomes at scale.
References
- Han Zhou, Yu Luo, Dianxiang Xu. “AutoACSL: Synthesizing ACSL Specifications by Integrating LLMs with CPG-Based Static Analysis.” AutoACSL paper on arXiv. 2026.
- Frama‑C/WP documentation, INRIA.
- Joern – Code Property Graph platform, https://joern.io.
Appendix
