Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language
arXiv cs.RO / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tackles a key limitation in reward learning from demonstrations: with limited data, reward models can overfit to spurious correlations because demos show behavior without clarifying which aspects of the state truly matter.
- It proposes Masked Inverse Reinforcement Learning (Masked IRL), which uses large language models to infer which state components are relevant based on natural-language instructions.
- Masked IRL enforces invariance to irrelevant state details, aiming to improve generalization beyond what demonstrations alone can provide.
- When instructions are ambiguous, the framework uses LLM reasoning to clarify them in the context of the demonstrations to better disambiguate among reward functions.
- Experiments in simulation and on a real robot show Masked IRL outperforms prior language-conditioned IRL methods by up to 15% while requiring up to 4.7× less data, improving sample efficiency and robustness.
Related Articles

Black Hat Asia
AI Business

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

How to Create AI Videos in 20 Minutes (3 Free Tools, Zero Experience)
Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets
Dev.to