Anyone else hitting token/latency issues when using too many tools with agents?

Reddit r/LocalLLaMA / 3/20/2026

💬 OpinionTools & Practical Usage

共有:

Key Points

The author experiments with an agent that has access to 25–30 tools (APIs and internal utilities) and notes issues as the tool count grows beyond 10–15.
They observe prompt size blow-up, rapidly increasing token usage, and noticeably higher latency, especially with multi-step reasoning.
They tried trimming tool descriptions, grouping tools, and manually selecting subsets, but none feel clean or scalable.
They ask the community for approaches to manage many tools, such as limiting tools, using dynamic loading, or accepting trade-offs.
The post concludes that this could become a bigger problem as agents grow more capable and tools proliferate.

I’ve been experimenting with an agent setup where it has access to ~25–30 tools (mix of APIs + internal utilities).

The moment I scale beyond ~10–15 tools: - prompt size blows up - token usage gets expensive fast - latency becomes noticeably worse (especially with multi-step reasoning)

I tried a few things: - trimming tool descriptions - grouping tools - manually selecting subsets

But none of it feels clean or scalable.

Curious how others here are handling this: