MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems
arXiv cs.LG / 5/1/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper argues that deploying LLMs as autonomous agent “execution cores” changes workloads from single-turn GPU inference to multi-turn LLM–tool loops spanning both GPUs and CPUs.
- It introduces MARS, an adaptive co-scheduling system that coordinates heterogeneous agentic workloads under coupled GPU–CPU resource pressure using unified visibility and a control plane that separates admission from execution.
- MARS uses an internal, agent-centric scheduler to cut end-to-end critical path time by prioritizing latency-sensitive continuations and adaptively keeping KV cache only when it improves “warm resumption” latency.
- Experiments report up to 5.94× lower end-to-end latency with nearly maximal throughput, and integrating MARS into the OpenHands coding agent accelerates task completion up to 1.87×.
- The authors state that the MARS source code will be made publicly available soon.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to

Automating FDA Compliance: AI for Specialty Food Producers
Dev.to