AsgardBench - Evaluating Visually Grounded Interactive Planning Under Minimal Feedback
arXiv cs.AI / 3/18/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- AsgardBench introduces a benchmark to evaluate visually grounded, high-level action sequence generation and interactive planning with plan adaptation driven by visual observations rather than navigation or low-level manipulation.
- The benchmark isolates interactive planning by restricting inputs to images, action history, and lightweight success/failure signals within a controlled simulator to avoid perception substitutions.
- It comprises 108 task instances across 12 task types with systematic variations to create conditional branches that require plan repair during execution.
- Evaluations indicate that leading vision-language models struggle without visual input, revealing weaknesses in visual grounding and state tracking that hinder interactive planning.




