A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

arXiv cs.RO / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes treating language-guided robotic grasping as a bounded “physical agentic loop” that operates over explicit, grounded execution states rather than single-shot action proposals.
It introduces a monitoring wrapper (“Watchdog”) around an unmodified grasp-and-lift manipulation primitive, using contact-aware fusion and temporal stabilization to convert noisy gripper telemetry into discrete outcome labels.
The loop feeds monitored outcome events (and optionally post-grasp semantic verification) into a deterministic bounded policy that can finalize the task, retry with recovery, or escalate to a user for clarification, while guaranteeing finite termination.
Experiments on a mobile manipulator with an eye-in-hand D405 camera show improved robustness and interpretability over open-loop grasp execution, including under visual ambiguity, distractors, and induced failures, with minimal added overhead.

Abstract

Robotic manipulation systems that follow language instructions often execute grasp primitives in a largely single-shot manner: a model proposes an action, the robot executes it, and failures such as empty grasps, slips, stalls, timeouts, or semantically wrong grasps are not surfaced to the decision layer in a structured way. Inspired by agentic loops in digital tool-using agents, we reformulate language-guided grasping as a bounded embodied agent operating over grounded execution states, where physical actions expose an explicit tool-state stream. We introduce a physical agentic loop that wraps an unmodified learned manipulation primitive (grasp-and-lift) with (i) an event-based interface and (ii) an execution monitoring layer, Watchdog, which converts noisy gripper telemetry into discrete outcome labels using contact-aware fusion and temporal stabilization. These outcome events, optionally combined with post-grasp semantic verification, are consumed by a deterministic bounded policy that finalizes, retries, or escalates to the user for clarification, guaranteeing finite termination. We validate the resulting loop on a mobile manipulator with an eye-in-hand D405 camera, keeping the underlying grasp model unchanged and evaluating representative scenarios involving visual ambiguity, distractors, and induced execution failures. Results show that explicit execution-state monitoring and bounded recovery enable more robust and interpretable behavior than open-loop execution, while adding minimal architectural overhead. For the source code and demo refer to our project page: https://wenzewwz123.github.io/Agentic-Loop/

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

Dev.to

How To Optimize Enterprise AI Energy Consumption

Dev.to

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

Key Points

Abstract

Related Articles

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

How To Optimize Enterprise AI Energy Consumption

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer