Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy

arXiv cs.RO / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

The paper addresses a key imitation-learning challenge: demonstrations collected from non-expert users often include errors that can make learned robot policies unsafe or degrade performance.
It proposes GiB (Good-in-Bad), an algorithm that automatically filters demonstrations by identifying and discarding only erroneous subtasks while retaining high-quality parts.
GiB uses a two-stage approach: it trains a self-supervised model to learn latent features and a binary classifier to mark segments as good or bad.
It then fits the latent-feature distribution of good-quality segments and applies Mahalanobis distance to detect and evaluate low-quality subtasks.
Experiments on a Franka robot across simulated and real multi-step tasks show improved policy performance when training on mixed-quality human demonstrations.

Abstract

Imitation learning offers a promising framework for enabling robots to acquire diverse skills from human users. However, most imitation learning algorithms assume access to high-quality demonstrations an unrealistic expectation when collecting data from non-expert users, whose demonstrations often contain inadvertent errors. Naively learning from such demonstrations can result in unsafe policy behavior, while discarding entire demonstrations due to occasional mistakes wastes valuable data, especially in low-data settings. In this work, we introduce GiB (Good-in-Bad), an algorithm that automatically identifies and discards erroneous subtasks within demonstrations while preserving high-quality subtasks. The filtered data can then be used by any policy learning algorithm to train more robust policies. GiB first trains a self-supervised model to learn latent features and assigns binary weights to label each demonstration as good or bad. It then models the latent feature distribution of high-quality segments and uses the Mahalanobis distance to detect and evaluate poor-quality subtasks. We validate GiB on the Franka robot in both simulated and real-world multi-step tasks, demonstrating improved policy performance when learning from mixed-quality human demonstrations.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

Dev.to

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Reddit r/MachineLearning

Fake News Detection using Machine Learning & NLP!

Dev.to

Good in Bad (GiB): Sifting Through End-user Demonstrations for Learning a Better Policy

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Nano Banana Pro vs DALL-E 3 vs Midjourney: A Practical Comparison From Someone Who Actually Uses All Three

LLMs edited 86 human essays toward a semantic cluster not occupied by any human writer [D]

Fake News Detection using Machine Learning & NLP!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer