Decision-Centric Design for LLM Systems

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM systems must not only generate text but also make explicit control decisions (e.g., answer vs. clarify vs. retrieve vs. tool-call vs. repair vs. escalate).
  • It identifies a common limitation in current architectures: decision logic is implicitly entangled with generation, making failures difficult to inspect, constrain, or recover from.
  • The proposed decision-centric framework separates decision-relevant signals from the policy that maps those signals to actions, making control an explicit and inspectable system layer.
  • The framework improves debuggability by enabling attribution of failures to specific components such as signal estimation, decision policy, or execution, rather than treating everything as one opaque step.
  • Experiments show the approach reduces futile actions and boosts task success while producing more interpretable failure modes, and it generalizes to both single-step and sequential action settings.

Abstract

LLM systems must make control decisions in addition to generating outputs: whether to answer, clarify, retrieve, call tools, repair, or escalate. In many current architectures, these decisions remain implicit within generation, entangling assessment and action in a single model call and making failures hard to inspect, constrain, or repair. We propose a decision-centric framework that separates decision-relevant signals from the policy that maps them to actions, turning control into an explicit and inspectable layer of the system. This separation supports attribution of failures to signal estimation, decision policy, or execution, and enables modular improvement of each component. It unifies familiar single-step settings such as routing and adaptive inference, and extends naturally to sequential settings in which actions alter the information available before acting. Across three controlled experiments, the framework reduces futile actions, improves task success, and reveals interpretable failure modes. More broadly, it offers a general architectural principle for building more reliable, controllable, and diagnosable LLM systems.