Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate
arXiv cs.AI / 4/29/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces “Latent Agents,” a post-training method that distills multi-agent debate into a single LLM to reduce the heavy compute cost of generating long debate transcripts.
- A two-stage fine-tuning pipeline uses debate-structure learning plus dynamic reward scheduling and length clipping, achieving performance that matches or exceeds explicit multi-agent debate while using up to 93% fewer tokens.
- Mechanistic analysis via activation steering suggests internalization produces agent-specific subspaces in the model’s activation space, with interpretable directions corresponding to different agent perspectives.
- The authors show a control-oriented application: malicious agents can be instilled through internalized debate and then suppressed via negative steering, making harmful behaviors easier to localize and manage with smaller overall performance degradation than methods applied to base models.
- The work includes released code, enabling reproducibility and further experimentation with distilled internalized reasoning behaviors.
Related Articles

Black Hat USA
AI Business

What to Build Still Beats How
Dev.to

I Build Systems, Flip Land, and Drop Trap Music — Meet Tyler Moncrieff aka Father Dust
Dev.to

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing
Dev.to

Whatsapp AI booking system in one prompt in 5 minutes
Dev.to