VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness
arXiv cs.RO / 4/30/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The paper introduces VLN-Cache, a training-free token caching method for vision-and-language navigation (VLN) models to reduce inference cost for real-time use.
- It argues that prior caching approaches break down in VLN because both visual dynamics (viewpoint changes move token positions) and semantic dynamics (token relevance changes across navigation stages) make cached tokens misaligned or stale.
- VLN-Cache addresses these issues with view-aligned remapping to restore geometric correspondences and a task-relevance saliency filter that prevents reuse at semantic transition points.
- It also uses a layer-adaptive entropy policy to manage a per-layer reuse budget, improving the trade-off between speed and accuracy.
- On the R2R-CE simulation benchmark, VLN-Cache achieves up to 1.52x faster inference while keeping competitive navigation success rates.
Related Articles

Black Hat USA
AI Business

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring
SCMP Tech

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges
Dev.to

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay
Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works
Dev.to