From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback
arXiv cs.RO / 5/1/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- Behavior cloning (BC) is brittle when demonstrations contain imperfect or noisy actions because standard pointwise action-label supervision can push learned policies away from the true desired behavior.
- The paper proposes CLIC (Contrastive policy Learning from Interactive Corrections), which replaces single-action targets with set-valued action targets derived from human corrective feedback.
- CLIC trains policies to assign probability mass over sets of desirable actions, enabling the method to handle both absolute and relative corrections and to capture multi-modal behavior.
- Experiments in both simulation and on real robots indicate that CLIC matches state-of-the-art performance with accurate data while offering substantially improved robustness to noisy, partial, and relative feedback.
- The authors make their implementation publicly available, facilitating reproduction and further research use.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to

Automating FDA Compliance: AI for Specialty Food Producers
Dev.to