I’ve been building multi-step LLM agents (LLM + tools), and debugging them has been way harder than I expected.
Some recurring issues I keep hitting:
- invalid JSON breaking the workflow
- prompts growing too large across steps
- latency spikes from specific tools
- no clear way to understand what changed between runs
Once flows get even slightly complex, logs stop being very helpful.
I’m curious how others are handling this — especially for multi-step agents.
Are you just relying on logs + retries, or using some kind of tracing / visualization?
I ended up building a small tracing setup for myself to see runs → spans → inputs/outputs, which helped a lot, but I’m wondering what approaches others are using.
[link] [comments]




