Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

arXiv cs.CL / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that although large language models are powerful, their compute cost, latency, and privacy risks limit real-world deployment, motivating the use of small language models (under 10B parameters).
  • It notes that prior work has mostly tried to improve small models via scaling laws or fine-tuning, rather than addressing their knowledge and reasoning gaps through agent paradigms.
  • The authors conduct a first large-scale, comprehensive study comparing <10B open-source models used as (1) base models, (2) single tool-using agents, and (3) collaborative multi-agent systems.
  • Results indicate that single-agent setups provide the best performance-to-cost trade-off, while multi-agent systems introduce additional overhead with limited improvements.
  • The study concludes that agent-centric system design is key to achieving efficient and trustworthy deployment of small models in resource-constrained environments.

Abstract

Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising alternative; however, their inherent limitations in knowledge and reasoning curtail their effectiveness. Existing research primarily focuses on enhancing SLMs through scaling laws or fine-tuning strategies while overlooking the potential of using agent paradigms, such as tool use and multi-agent collaboration, to systematically compensate for the inherent weaknesses of small models. To address this gap, this paper presents the first large-scale, comprehensive study of <10B open-source models under three paradigms: (1) the base model, (2) a single agent equipped with tools, and (3) a multi-agent system with collaborative capabilities. Our results show that single-agent systems achieve the best balance between performance and cost, while multi-agent setups add overhead with limited gains. Our findings highlight the importance of agent-centric design for efficient and trustworthy deployment in resource-constrained settings.