Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms
arXiv cs.CL / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that although large language models are powerful, their compute cost, latency, and privacy risks limit real-world deployment, motivating the use of small language models (under 10B parameters).
- It notes that prior work has mostly tried to improve small models via scaling laws or fine-tuning, rather than addressing their knowledge and reasoning gaps through agent paradigms.
- The authors conduct a first large-scale, comprehensive study comparing <10B open-source models used as (1) base models, (2) single tool-using agents, and (3) collaborative multi-agent systems.
- Results indicate that single-agent setups provide the best performance-to-cost trade-off, while multi-agent systems introduce additional overhead with limited improvements.
- The study concludes that agent-centric system design is key to achieving efficient and trustworthy deployment of small models in resource-constrained environments.



