- Updated: March 13, 2026
- 6 min read
Prompt Caching Tool Revolutionizes AI Development with Token Savings
Prompt Caching is an open‑source MIT‑licensed tool that cuts AI token consumption by up to 92 % by automatically caching stable prompt fragments and re‑using them across multiple turns.
Why Prompt Caching matters for AI developers today
AI‑driven development platforms such as Claude Code, ChatGPT, and Perplexity charge per token, turning every extra request into a measurable cost. For tech enthusiasts, startups, and enterprises that run long coding sessions, these costs can quickly become prohibitive. Prompt Caching solves this problem by storing stable content server‑side for a short window (default five minutes) and serving it back at a fraction of the original token price.
Beyond raw savings, the tool introduces four distinct session modes that adapt to common development workflows—bug fixing, refactoring, file tracking, and conversation freezing—making it a versatile addition to any AI‑augmented coding stack.

How Prompt Caching works under the hood
Prompt Caching leverages Anthropic’s caching API, which distinguishes between cache creation (cost ≈ 1.25× a normal request) and cache reads (cost ≈ 0.1×). When a prompt contains stable content—such as a file’s source code, a style guide, or a previously generated error trace—the plugin inserts a hidden breakpoint. The first time the fragment is sent, it is cached; subsequent reads retrieve the cached version at a dramatically reduced token price.
The plugin is language‑agnostic and integrates with any MCP‑compatible client (Claude Code, Cursor, Windsurf, Zed, Continue.dev, etc.). No manual configuration is required; the tool detects the appropriate breakpoints automatically, allowing developers to focus on logic rather than token accounting.
Quantifiable benefits and token savings
Real‑world benchmarks performed on Claude Code’s Sonnet model demonstrate dramatic reductions:
- Bug‑fix sessions saved up to 85 % of tokens.
- Refactor sessions (five files) saved 80 %.
- General coding sessions saved an average of 92 %.
- Repeated file reads (5 × 5) achieved a 90 % reduction.
These savings compound after the first turn because every subsequent read incurs only 0.1× the original cost. For teams that run dozens of turns per day, the cumulative effect translates into thousands of dollars saved on API usage.
The four session modes that power Prompt Caching
🐛 BugFix Mode
When a stack trace appears in the conversation, BugFix Mode captures the offending file together with the error context. The first time the file is sent, it is cached; follow‑up questions only pay for the new diagnostic query, not the entire file again.
♻️ Refactor Mode
Refactor Mode watches for keywords such as “refactor”, “rename”, or “extract method”. It caches the original code pattern, style guides, and type definitions. Subsequent instructions that target individual files reuse the cached baseline, dramatically cutting token usage for multi‑file refactors.
📂 File Tracking Mode
Every file read is counted. On the second read, the plugin injects a cache breakpoint automatically, turning the file into a cached resource for the remainder of the session. This “always‑on” mode works without any explicit user action.
🧊 Conversation Freeze Mode
After a configurable number of turns (default = N), Conversation Freeze freezes all messages before turn (N‑3) as a cached prefix. Only the last three turns are sent fresh, ensuring that long back‑and‑forth dialogues stay cheap while preserving context.
Benchmark results – token savings by session type
| Session type | Turns | Tokens without caching | Tokens with caching | Savings |
|---|---|---|---|---|
| BugFix (single file) | 20 | 184,000 | 28,400 | 85 % |
| Refactor (5 files) | 15 | 310,000 | 61,200 | 80 % |
| General coding | 40 | 890,000 | 71,200 | 92 % |
| Repeated file reads (5 × 5) | — | 50,000 | 5,100 | 90 % |
The break‑even point is typically reached after the second turn, after which every subsequent interaction yields pure savings.
Getting started: installation in minutes
Prompt Caching can be installed in two ways, depending on your workflow preference.
🚀 One‑click install for Claude Code (recommended)
- Open Claude Code.
- Run the following command in the plugin console:
/plugin marketplace add https://github.com/flightlesstux/prompt-caching - Install the plugin:
/plugin install prompt-caching@ercan-ermis - After installation, the
get_cache_statscommand becomes available immediately.
💻 Global npm installation for any MCP client
- Run
npm install -g prompt-caching-mcpin your terminal. - Add the service to your client’s MCP configuration. Example for Cursor:
{ "mcpServers": { "prompt-caching-mcp": { "command": "prompt-caching-mcp" } } } - Restart the client (if required) and verify the plugin is active by issuing
prompt-caching-mcp --help.
Both methods require no additional configuration files, no restarts (Claude Code), and work out‑of‑the‑box with popular IDEs such as Zed, Continue.dev, and Windsurf.
Open‑source licensing and community support
Prompt Caching is released under the permissive MIT License. This means you can freely use, modify, and redistribute the code in commercial or private projects without worrying about licensing fees. The source code lives on GitHub, where contributors regularly submit enhancements, bug fixes, and new session‑mode ideas.
Because the tool is MIT‑licensed, there is zero lock‑in. You can fork the repository, integrate it with your own CI/CD pipeline, or even embed it into a proprietary AI platform without violating the license.
Conclusion: unlock massive token savings today
Prompt Caching delivers a clear, measurable ROI for anyone who builds or maintains AI‑driven applications. By automatically caching stable prompt fragments, it reduces token consumption by up to 92 %, shortens response times, and eliminates the need for manual prompt engineering.
If you’re ready to integrate token‑saving intelligence into your workflow, explore the UBOS platform overview for a broader AI automation ecosystem, or jump straight into a ready‑made template such as the AI SEO Analyzer to see Prompt Caching in action.
For startups looking to accelerate AI adoption, the UBOS for startups page outlines special pricing and support options. SMBs can benefit from UBOS solutions for SMBs, while enterprises may explore the Enterprise AI platform by UBOS for large‑scale deployments.
Need a visual editor to prototype your own Prompt Caching‑enabled app? Try the Web app editor on UBOS or automate complex workflows with the Workflow automation studio. Pricing details are transparent on the UBOS pricing plans page.
Finally, join the UBOS partner program to collaborate on future AI tooling, share your Prompt Caching extensions, and stay ahead of the token‑optimization curve.
Start saving tokens now—install Prompt Caching and watch your AI development costs shrink dramatically.