Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

arXiv stat.ML / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes Learning-to-Defer, showing that standard formulations break down when the system can choose additional “advice” (e.g., retrieved documents or tool outputs) after routing to an expert.
  • It proves that a broad class of separated surrogate approaches that learn routing and advice with distinct heads can be inconsistent even in the smallest non-trivial setting.
  • The authors propose an augmented surrogate that treats routing-plus-advice as a composite action space and proves an H-consistency guarantee with an excess-risk transfer bound, implying convergence to the Bayes-optimal policy in the limit.
  • Experiments across tabular, language, and multimodal tasks indicate the augmented method improves over standard Learning-to-Defer by changing how it acquires advice depending on the cost regime.
  • A synthetic benchmark reproduces the predicted failure mode of separated surrogates, supporting the theoretical analysis.

Abstract

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an \mathcal{H}-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.