X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
arXiv cs.CV / 4/23/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- X-Cache is a training-free inference acceleration method for few-step autoregressive world models, targeting the high cost that blocks interactive deployment.
- Instead of reusing diffusion results across denoising steps (which few-step distilled models lack), X-Cache caches residuals across consecutive generation chunks.
- It uses a dual-metric gating strategy based on structure- and action-aware block-input fingerprints to decide per block whether to recompute or reuse cached residuals.
- To avoid errors contaminating the persistent autoregressive KV cache, X-Cache detects KV update chunks and forces full computation on them, preventing error propagation.
- Implemented on X-world (a production multi-camera driving world model with multi-block causal DiT and rolling KV cache), X-Cache reports a 71% block skip rate and a 2.6× wall-clock speedup with minimal degradation.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

AI Tutor That Works Offline — Study Anywhere with EaseLearn AI
Dev.to