The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation
arXiv cs.CV / 4/22/2026
📰 NewsModels & Research
Key Points
- Vision-and-language navigation (VLN) agents can self-improve from policy-induced experience, but this only works when behavioral diversity and learning stability are carefully balanced to produce a reliable learning signal.
- Simply increasing diversity can destabilize learning signals, while overly strict stability constraints reduce exploration, causing early commitment that harms reliable self-improvement.
- The paper introduces Stability-Diversity Balance (SDB), a plug-and-play method that generates multiple latent behavioral hypotheses per decision step by applying controlled shifts to instruction-conditioned hidden states.
- SDB uses reliability-aware soft evaluation and aggregation, plus an explicit regularizer to prevent hypothesis drift or collapse, improving stable self-improvement while retaining training signals.
- Experiments on R2R, SOON, and REVERIE show consistent gains, including on REVERIE val-unseen where SPL increases from 33.73 to 35.93 and OSR from 51.07 to 54.25.
Related Articles

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons
Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?
SCMP Tech