been messing around with some agent / RAG pipelines
running into cases where everything executes fine (tool calls return expected outputs, parsing works etc.) but final answer is still wrong / slightly off
nothing crashes, just bad outputs
curious how people are actually debugging this in practice
are you:
- using evals?
- tracing tools (langsmith etc)?
- stepping through logs manually?
- or just accepting some % of bad outputs
feels like a lot of cases where nothing technically fails but output is still wrong
[link] [comments]



