iTeach: In the Wild Interactive Teaching for Failure-Driven Adaptation of Robot Perception

arXiv cs.RO / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

iTeachは、ロボットの知覚モデルが現場の分布外（散らかり、遮蔽、未知の物体など）で失敗した際に、デプロイ中にその失敗を人と共同で収集・学習へつなげる「失敗駆動の対話型ティーチング」フレームワークを提案しています。
実運用の場で人がモデルの予測を見ながら失敗例を特定し、短時間のHumanPlay（人と物体の相互作用）を行ってRGB-D動画を記録することで、役に立つ物体配置の情報を効率的に取得します。
ラベリング負荷を下げるため、FS3（Few-Shot Semi-Supervised）により、短い相互作用列の最終フレームのみを視線入力と音声指示で注釈し、そのラベルを動画全体へ伝播して密な教師信号を生成します。
未見の物体インスタンスセグメンテーション（UOIS）で、少数の失敗駆動サンプルから反復微調整することで、pretrained MSMFormerからの初期状態でも多様な実環境でセグメンテーション性能が大きく向上します。
改善はSceneReplicaでの把持・ピッキング/配置（pick-and-place）成功率の向上や、実ロボット実験の結果として下流のマニピュレーション性能にも直接反映されます。

Abstract

Robotic perception models often fail when deployed in real-world environments due to out-of-distribution conditions such as clutter, occlusion, and novel object instances. Existing approaches address this gap through offline data collection and retraining, which are slow and do not resolve deployment-time failures. We propose iTeach, a failure-driven interactive teaching framework for adapting robot perception in the wild. A co-located human observes model predictions during deployment, identifies failure cases, and performs short human-object interaction (HumanPlay) to expose informative object configurations while recording RGB-D sequences. To minimize annotation effort, iTeach employs a Few-Shot Semi- Supervised (FS3) labeling strategy, where only the final frame of a short interaction sequence is annotated using hands-free eye-gaze and voice commands, and labels are propagated across the video to produce dense supervision. The collected failure-driven samples are used for iterative fine-tuning, enabling progressive deployment-time adaptation of the perception model. We evaluate iTeach on unseen object instance segmentation (UOIS) starting from a pretrained MSMFormer model. Using a small number of failure-driven samples, our method significantly improves segmentation performance across diverse real-world scenes. These improvements directly translate to higher grasping and pick-and-place success on the SceneReplica benchmark and real robotic experiments. Our results demonstrate that failure-driven, co-located interactive teaching enables efficient in-the-wild adaptation of robot perception and improves downstream manipulation performance. Project page at https://irvlutd.github.io/iTeach

Black Hat Asia

AI Business

Vibe Coding Is Changing How We Build Software. ERP Teams Should Pay Attention

Dev.to

I scanned every major vibe coding tool for security. None scored above 90.

Dev.to

I Finally Checked What My AI Coding Tools Actually Cost. The Number Made No Sense.

Dev.to

Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?

Reddit r/artificial

iTeach: In the Wild Interactive Teaching for Failure-Driven Adaptation of Robot Perception

Key Points

Abstract

Related Articles

Black Hat Asia

Vibe Coding Is Changing How We Build Software. ERP Teams Should Pay Attention

I scanned every major vibe coding tool for security. None scored above 90.

I Finally Checked What My AI Coding Tools Actually Cost. The Number Made No Sense.

Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer