The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

arXiv cs.AI / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

The paper addresses “intelligent disobedience” in shared autonomy, where an assistive AI may need to override a human instruction to prevent harm.
It proposes the Intelligent Disobedience Game (IDG), a sequential Stackelberg-style framework that models human leadership under asymmetric information and derives optimal strategies for both agents.
The analysis identifies key strategic phenomena such as “safety traps,” where the system avoids harm indefinitely but may fail to accomplish the human’s intended goal.
The work translates IDG into a shared-control Multi-Agent Markov Decision Process, creating a compact computational testbed for training reinforcement learning agents to learn safe non-compliance.
The authors position IDG as both a theoretical foundation for agent development and an experimental foundation to study how humans perceive and trust disobedient AI.

Abstract

In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日経XTECH

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

日経XTECH

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

Publickey

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

Qiita

なんと397BのAIモデルをiPhoneで動かすことに成功

GIGAZINE

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

要点

Abstract

関連記事

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

なんと397BのAIモデルをiPhoneで動かすことに成功

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer