Belief Dynamics for Detecting Behavioral Shifts in Safe Collaborative Manipulation

arXiv cs.LG / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses how robots in shared workspaces can become unsafe when a collaborating agent switches behavioral strategy mid-episode and the robot continues under outdated assumptions.
In ManiSkill shared-workspace manipulation tasks, across 10 regime-switch detection methods, enabling detection cuts post-switch collisions by 52%, but reliability varies widely depending on the allowed detection tolerance.
Under a realistic tolerance of ±3 steps, detection performance ranges from 86% down to 30%, while with a looser ±5 tolerance all methods reach 100%, highlighting practical constraints for deployment.
The authors propose UA-TOM, a lightweight belief-tracking module that augments frozen vision-language-action control backbones with selective state-space dynamics, causal attention, and prediction-error signals, improving detection rate (85.7% at ±3) and reducing close-range time (4.8 steps) while outperforming an Oracle in their metric.
UA-TOM’s analysis shows regime switches cause a 17x increase in hidden-state update magnitude that decays over ~10 timesteps, with inference overhead of 7.4 ms (14.8% of a 50 ms control budget), and complementary behavior verified in a cross-domain Overcooked experiment.

Abstract

Robots operating in shared workspaces must maintain safe coordination with other agents whose behavior may change during task execution. When a collaborating agent switches strategy mid-episode, continuing under outdated assumptions can lead to unsafe actions and increased collision risk. Reliable detection of such behavioral regime changes is therefore critical. We study regime-switch detection under controlled non-stationarity in ManiSkill shared-workspace manipulation tasks. Across ten detection methods and five random seeds, enabling detection reduces post-switch collisions by 52%. However, average performance hides significant reliability differences: under a realistic tolerance of +-3 steps, detection ranges from 86% to 30%, while under +-5 steps all methods achieve 100%. We introduce UA-TOM, a lightweight belief-tracking module that augments frozen vision-language-action (VLA) control backbones using selective state-space dynamics, causal attention, and prediction-error signals. Across five seeds and 1200 episodes, UA-TOM achieves the highest detection rate among unassisted methods (85.7% at +-3) and the lowest close-range time (4.8 steps), outperforming an Oracle (5.3 steps). Analysis shows hidden-state update magnitude increases by 17x at regime switches and decays over roughly 10 timesteps, while the discretization step converges to a near-constant value (Delta_t approx 0.78), indicating sensitivity driven by learned dynamics rather than input-dependent gating. Cross-domain experiments in Overcooked show complementary roles of causal attention and prediction-error signals. UA-TOM introduces 7.4 ms inference overhead (14.8% of a 50 ms control budget), enabling reliable regime-switch detection without modifying the base policy.

Black Hat Asia

AI Business

Meta's latest model is as open as Zuckerberg's private school

The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds

SCMP Tech

Why multi-agent AI security is broken (and the identity patterns that actually work)

Dev.to

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial

Belief Dynamics for Detecting Behavioral Shifts in Safe Collaborative Manipulation

Key Points

Abstract

Related Articles

Black Hat Asia

Meta's latest model is as open as Zuckerberg's private school

AI fuels global trade growth as China-US flows shift, McKinsey finds

Why multi-agent AI security is broken (and the identity patterns that actually work)

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer