Hey everyone,
Over the past few months I’ve been building and testing different RAG setups (LangChain, LlamaIndex, custom pipelines, etc.), and I kept running into the same frustrating issue.
When a RAG system starts producing bad answers, everyone immediately blames the LLM.
But most of the time the actual problem is somewhere in the pipeline.
Things like:
• documents aren’t chunked correctly • embeddings don’t match the retrieval model • retrieval isn’t actually happening when you think it is • context window is overflowing • vector search is misconfigured • prompt injection risks After debugging this stuff over and over, I started building a small CLI tool that analyzes a codebase and tries to detect structural problems in RAG pipelines.
The idea is basically:
“ESLint but for RAG architectures.”
The tool parses the codebase, runs a rule engine, and reports possible issues.
One important design choice I made:
the analysis itself is deterministic. AI is only used to explain the findings in plain language.
That way the tool can still run in CI and produce reproducible results.
It’s still early, but I’m curious:
What RAG issues are you seeing most often in real projects?
Also if anyone wants to try breaking it with weird pipelines, that would actually be very helpful.
Repo:
https://github.com/NeuroForgeLabs/rag-doctor
Would really appreciate feedback from people building RAG systems.
[link] [comments]




