The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes
arXiv cs.AI / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper addresses “intelligent disobedience” in shared autonomy, where an assistive AI may need to override a human instruction to prevent harm.
- It proposes the Intelligent Disobedience Game (IDG), a sequential Stackelberg-style framework that models human leadership under asymmetric information and derives optimal strategies for both agents.
- The analysis identifies key strategic phenomena such as “safety traps,” where the system avoids harm indefinitely but may fail to accomplish the human’s intended goal.
- The work translates IDG into a shared-control Multi-Agent Markov Decision Process, creating a compact computational testbed for training reinforcement learning agents to learn safe non-compliance.
- The authors position IDG as both a theoretical foundation for agent development and an experimental foundation to study how humans perceive and trust disobedient AI.
