Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Mini-BEHAVIOR-Gran, a new embodied AI benchmark designed to study how instruction granularity affects language-guided agent behavior under controlled conditions.
Unlike prior benchmarks that use a single static instruction per task, this benchmark provides multiple instruction variants per task, from high-level goals to step-by-step guidance.
The authors evaluate four metrics for quantifying cross-task granularity (token count, entity count, action-verb count, and planning-width) and find planning-width correlates most consistently with agent performance.
When training and evaluation are organized using planning-width, the relationship between instruction granularity and performance is non-monotonic, showing a U-shaped pattern with peaks at both very fine and very coarse extremes.
The coarse-granularity performance rebound is attributed to shallow grounding, where agents tend to learn vision-dominant policies rather than deeper instruction grounding.

Abstract

Instruction granularity is an important yet poorly controlled variable in language-guided embodied AI. Existing benchmarks typically pair each task with a single static instruction, making it difficult to study how agent behavior changes when the same task is described at different levels of detail. We introduce Mini-BEHAVIOR-Gran, a new benchmark for controlled studies of instruction granularity that extends Mini-BEHAVIOR with multiple instruction variants per task, ranging from high-level goal descriptions to step-by-step guidance. Using this benchmark, we compare four candidate metrics for cross-task granularity quantification: token count, entity count, action-verb count, and planning-width, and find that width correlates most consistently with agent performance. Using width to organize training and evaluation further reveals a non-monotonic U-shaped relationship between instruction granularity and performance, with peaks at both fine and coarse extremes. Further analysis suggests that the coarse-granularity performance rebound is associated with shallow grounding, where agents learn vision-dominant policies.

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.

Reddit r/artificial

Why I Built byCode: A 100% Local, Privacy-First AI IDE

Dev.to

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs

The Register

v0.21.1

Ollama Releases

How I Built an AI Agent That Investigates Cloud Bill Spikes (Architecture Inside)

Dev.to

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

Key Points

Abstract

Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.

Why I Built byCode: A 100% Local, Privacy-First AI IDE

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs

v0.21.1

How I Built an AI Agent That Investigates Cloud Bill Spikes (Architecture Inside)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer