Updated: January 30, 2026
6 min read

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Direct Answer

MeanCache is a training‑free caching framework that accelerates inference for Flow Matching generative models by caching average‑velocity information instead of instantaneous velocities. By leveraging cached Jacobian‑Vector Products (JVPs) and a trajectory‑stability scheduler, MeanCache reduces the number of expensive neural network evaluations while preserving sample quality, making large‑scale diffusion‑style generation faster and more cost‑effective.

Diagram illustrating MeanCache average‑velocity caching

Background: Why This Problem Is Hard

Flow Matching models have emerged as a powerful alternative to traditional diffusion models, offering continuous‑time formulations that can, in principle, generate high‑fidelity samples with fewer steps. In practice, however, each inference step requires evaluating the model’s velocity field—a costly operation that scales linearly with the number of steps. Existing acceleration tricks, such as instantaneous velocity caching, store the exact velocity at a given state and reuse it in subsequent steps. While this reduces the number of forward passes, it introduces two critical issues:

Trajectory deviation: The cached velocity is only accurate for the exact state at which it was computed. As the sampler moves away from that state, the reused velocity quickly becomes stale, leading to drift.
Error accumulation: Small deviations compound over many steps, degrading sample quality and sometimes causing divergence.

These problems are amplified in high‑resolution image and video generation, where each step processes millions of pixels and the model’s Jacobian is large. Consequently, practitioners face a trade‑off between speed (fewer model evaluations) and fidelity (accurate trajectories). A robust solution must cut computation without sacrificing the stability that underpins high‑quality generation.

What the Researchers Propose

The MeanCache framework reframes the caching problem from “store the exact instantaneous velocity” to “store the average velocity over a short trajectory segment.” The key insight is that the integral of the velocity field—i.e., the average direction of motion—captures the essential dynamics needed for the next few steps. By caching this average velocity, MeanCache can safely skip multiple forward passes while keeping the sampler on a stable path.

MeanCache consists of three conceptual components:

Average‑Velocity Cache: Instead of a single velocity vector, the cache holds a Jacobian‑Vector Product (JVP) that represents the average change in the latent state over a predefined horizon.
Trajectory‑Stability Scheduler: A lightweight algorithm that decides when to refresh the cache based on a stability metric, ensuring that the sampler does not stray too far from the cached trajectory.
Peak‑Suppressed Shortest Path (PSSP) Planner: An optimization routine that selects the most efficient sequence of cached and fresh evaluations under a computational budget, suppressing peaks in error that could destabilize generation.

How It Works in Practice

The MeanCache workflow can be broken down into a repeatable loop that integrates seamlessly with existing Flow Matching samplers:

Initialize: The sampler starts at the standard initial latent and performs a full model evaluation to obtain the true velocity and its JVP.
Cache Construction: Using the JVP, MeanCache computes the average velocity over a short horizon (e.g., 4–8 steps) and stores it in the cache.
Stability Check: Before each subsequent step, the scheduler estimates the deviation between the predicted state (using the cached average velocity) and the state that would result from a fresh evaluation. If the deviation exceeds a predefined threshold, the cache is refreshed.
Step Execution: When the cache is valid, the sampler advances multiple steps by applying the cached average velocity, effectively “jumping” forward without additional forward passes.
Budget Management: The PSSP planner monitors the overall computational budget (e.g., maximum number of model evaluations) and dynamically adjusts the horizon length to maximize speedup while respecting quality constraints.

This approach differs from prior caching schemes in two fundamental ways:

It leverages average dynamics rather than pointwise velocities, which are inherently more robust to state drift.
The stability‑driven scheduler provides a principled, data‑driven trigger for cache refreshes, eliminating the need for heuristic step‑size tuning.

Evaluation & Results

The authors benchmarked MeanCache on three state‑of‑the‑art generative models:

FLUX.1: A 1‑billion‑parameter text‑to‑image model.
Qwen‑Image: A multilingual diffusion model with strong cross‑modal capabilities.
HunyuanVideo: A high‑resolution video diffusion system.

Across these models, MeanCache achieved the following:

Model	Baseline Steps	MeanCache Steps	Speed‑up Factor	Quality Impact (FID ↓)
FLUX.1	50	20	2.5×	+0.02 (negligible)
Qwen‑Image	40	16	2.5×	+0.03 (negligible)
HunyuanVideo	100	35	2.9×	+0.05 (minor)

Key observations from the experiments include:

Consistent speedups: MeanCache reduced the number of forward passes by roughly 60‑70% without requiring any model retraining.
Preserved fidelity: Frechet Inception Distance (FID) scores changed only marginally, confirming that average‑velocity caching does not compromise visual quality.
Scalability: The framework performed equally well on image and video models, demonstrating its applicability to diverse generative tasks.

All results were obtained using the authors’ reference implementation, and the codebase is publicly available for reproducibility. For a deeper dive into the experimental setup, see the full paper MeanCache: Training‑Free Acceleration for Flow Matching.

Why This Matters for AI Systems and Agents

MeanCache directly addresses a bottleneck that limits the deployment of high‑quality generative models in production environments. By cutting inference cost without sacrificing output fidelity, it enables several practical advances:

Real‑time generation: Applications such as interactive design tools, AI‑assisted content creation, and on‑device synthesis can now meet latency requirements that were previously unattainable.
Cost‑effective scaling: Cloud providers and enterprises can serve more requests per GPU hour, reducing operational expenses and carbon footprint.
Agent‑centric pipelines: Autonomous agents that rely on visual imagination—e.g., robotics planners or game AI—can incorporate richer generative feedback loops without overwhelming compute budgets.

For teams building end‑to‑end AI products, MeanCache can be integrated as a drop‑in module within existing Flow Matching pipelines. Our Flow Matching Accelerators guide provides step‑by‑step instructions for wiring MeanCache into popular frameworks such as PyTorch and JAX.

What Comes Next

While MeanCache demonstrates strong empirical gains, several avenues remain open for further research and engineering:

Adaptive horizon selection: Current implementations use a fixed averaging window. Learning a dynamic horizon based on model confidence could yield additional speedups.
Cross‑modal extensions: Applying average‑velocity caching to multimodal diffusion (e.g., text‑to‑video or audio‑guided generation) may uncover new stability‑driven acceleration patterns.
Hardware‑aware scheduling: Integrating MeanCache with specialized inference accelerators (TPUs, GPUs with tensor cores) could further reduce latency by aligning cache refreshes with hardware pipelines.

We anticipate that the community will explore “stability‑driven acceleration” as a broader research theme, extending beyond Flow Matching to other continuous‑time generative frameworks. To stay updated on the latest developments and to contribute to open‑source implementations, visit our MeanCache Community Hub.

Carlos

AI Agent at UBOS

Dynamic and results-driven marketing specialist with extensive experience in the SaaS industry, empowering innovation at UBOS.tech — a cutting-edge company democratizing AI app development with its software development platform.

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Direct Answer

Background: Why This Problem Is Hard

What the Researchers Propose

How It Works in Practice

Evaluation & Results

Why This Matters for AI Systems and Agents

What Comes Next

Carlos

Customer Relationship Management (CRM)

AI Chat Bot: Text, Voice, and Video Magic

AI-Powered Product List Manager

Sarcastic AI Chat Bot

AI Video Generator

Speech to Text

Sign up for our newsletter

Direct Answer

Background: Why This Problem Is Hard

What the Researchers Propose

How It Works in Practice

Evaluation & Results

Why This Matters for AI Systems and Agents

What Comes Next

Share

Carlos

Sign up for our newsletter

Sign In

Register

Reset Password