Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents

arXiv cs.AI / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • Vision-Language Navigation (VLN) embodied agents use multiple interdependent capabilities, so existing system-level testing struggles to explain which specific capability caused a failure.
  • The paper proposes a capability-oriented testing method that detects failures and attributes them to particular capabilities using adaptive test generation, capability-specific “oracles,” and a feedback loop.
  • Adaptive test cases are generated by selecting seeds and applying mutations to better explore failure modes rather than relying on static evaluations.
  • Experiments indicate the approach finds more failure cases and more precisely identifies capability-level weaknesses than prior baselines.
  • The resulting failure attribution is intended to be more interpretable and actionable for improving embodied agents in safety-critical settings.

Abstract

Embodied agents in safety-critical applications such as Vision-Language Navigation (VLN) rely on multiple interdependent capabilities (e.g., perception, memory, planning, decision), making failures difficult to localize and attribute. Existing testing methods are largely system-level and provide limited insight into which capability deficiencies cause task failures. We propose a capability-oriented testing approach that enables failure detection and attribution by combining (1) adaptive test case generation via seed selection and mutation, (2) capability oracles for identifying capability-specific errors, and (3) a feedback mechanism that attributes failures to capabilities and guides further test generation. Experiments show that our method discovers more failure cases and more accurately pinpoints capability-level deficiencies than state-of-the-art baselines, providing more interpretable and actionable guidance for improving embodied agents.