AsgardBench: A benchmark for visually grounded interactive planning

Microsoft Research Blog / 3/27/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • AsgardBench is presented as a new benchmark focused on visually grounded interactive planning in embodied AI scenarios where a system must perceive, plan, and revise actions over time.
  • The described kitchen-cleaning example highlights the need to handle dynamic, unexpected conditions such as objects already being in the desired state or the environment containing additional items that block the original plan.
  • The benchmark emphasizes grounding decisions in visual observations and evaluating performance in interactive settings rather than static instruction-following.
  • By targeting these interactive planning challenges, AsgardBench aims to better measure progress toward robust embodied agents that can adapt when outcomes differ from expectations.

Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example, when the mug it was tasked to wash is already clean, or the sink is full of other items. This is the domain of embodied AI: systems […]

The post AsgardBench: A benchmark for visually grounded interactive planning appeared first on Microsoft Research.