I’ve been experimenting with an agent setup where it has access to ~25–30 tools (mix of APIs + internal utilities).
The moment I scale beyond ~10–15 tools: - prompt size blows up - token usage gets expensive fast - latency becomes noticeably worse (especially with multi-step reasoning)
I tried a few things: - trimming tool descriptions - grouping tools - manually selecting subsets
But none of it feels clean or scalable.
Curious how others here are handling this:
- Are you limiting number of tools?
- Doing some kind of dynamic loading?
- Or just accepting the trade-offs?
Feels like this might become a bigger problem as agents get more capable.
[link] [comments]




