ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control

arXiv cs.LG / 3/31/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ATLAS-RTC is introduced as a runtime control system for autoregressive LLMs that enforces structured output during token-by-token decoding.
  • The method uses lightweight monitoring signals to detect drift from predefined output contracts and then applies interventions such as biasing, masking, or rollback within a closed loop.
  • Compared with post-hoc validation or static constrained decoding, ATLAS-RTC aims to prevent errors by correcting generation before they fully manifest.
  • Experiments on structured generation and tool-calling tasks show large gains in first-attempt success rates (20 to 37.8 percentage points) and substantial latency reductions in failure-heavy scenarios (up to 88%).
  • The authors argue that many observed failures stem from decoding artifacts rather than true task misunderstanding, positioning runtime control as a separate, important layer for reliable LLM systems.

Abstract

We present ATLAS-RTC, a runtime control system for autoregressive language models that enforces structured output during decoding. ATLAS-RTC monitors generation at each step, detects drift from output contracts using lightweight signals, and applies targeted interventions such as biasing, masking, and rollback. Unlike post-hoc validation or static constrained decoding, it operates in a closed loop, enabling correction before errors materialize. Across structured generation and tool-calling tasks, ATLAS-RTC improves first-attempt success rates by 20 to 37.8 percentage points, with up to 88% latency reduction in failure-dominated settings. Results show that many failures arise from decoding artifacts rather than task misunderstanding, motivating runtime control as a distinct layer in LLM systems.