Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Safe Bilevel Delegation (SBD), a formal runtime framework for making safe delegation decisions in hierarchical multi-agent systems where LLM agents operate in high-stakes settings.
  • SBD models delegation as a bilevel optimization problem, using an outer meta-weight network to learn context-dependent safety–efficiency trade-off weights and an inner loop that enforces a probabilistic safety constraint.
  • A continuous delegation degree parameter (alpha) smoothly interpolates control between full human override and fully autonomous execution based on task context.
  • The authors prove three theoretical results, including safety monotonicity, convergence of the inner optimization via projected gradient descent, and an accountability propagation bound across multi-hop delegation chains.
  • They plan empirical validation across three high-stakes domains (medical AI, financial risk control, and educational supervision) using specified datasets, safety constraint sets, baselines, and evaluation protocols.

Abstract

As large language model (LLM) agents are deployed in high-stakes environments, the question of how safely to delegate subtasks to specialized sub-agents becomes critical. Existing work addresses multi-agent architecture selection at design time or provides broad empirical guidelines, but neither provides a runtime mechanism that dynamically adjusts the safety-efficiency trade-off as task context changes during execution. We propose Safe Bilevel Delegation (SBD), a formal framework for runtime delegation safety in hierarchical multi-agent systems. SBD formulates task delegation as a bilevel optimization problem: an outer meta-weight network phi learns context-dependent safety-efficiency weights lambda(s) in [0,1]; an inner loop optimizes the delegation policy pi subject to a probabilistic safety constraint P(safe) >= 1-delta. The continuous delegation degree alpha in [0, 1] controls how much decision authority is transferred to each sub-agent, interpolating smoothly between full human override (alpha=0) and fully autonomous execution (alpha=1). We establish three theoretical results: (1) Safety Monotonicity--higher outer safety weight produces a weakly safer inner policy; (2) Inner Policy Convergence--projected gradient descent on the inner problem converges linearly under standard smoothness assumptions; (3) an Accountability Propagation bound that distributes responsibility across multi-hop delegation chains with a provable per-agent ceiling. We instantiate SBD in three high-stakes domains--medical AI (MIMIC-III), financial risk control (S and P 500), and educational agent supervision (ASSISTments)--specifying datasets, safety constraint sets, baselines, and evaluation protocols. This manuscript presents the formal framework and theoretical results in full; empirical validation following the protocols described herein is planned and will be reported in a forthcoming revision.