Been building RAG pipelines for a while and kept hitting the same wall: the retrieval works fine on simple documents but falls apart the moment you throw a dense financial report, legal filing, or technical manual at it.
Spent some time digging into why and it basically comes down to one thing — similarity is not the same as relevance. The chunk that scores highest cosine similarity to your query is often not the chunk that actually answers it. Especially when the answer lives in an appendix that a cross-reference points to, or requires understanding the document's structure rather than just matching surface text.
Came across PageIndex (github.com/VectifyAI/PageIndex, 21.5k stars) which takes a completely different approach no embeddings, no vector DB, no chunking. Instead it builds a hierarchical tree index of the document (like a rich ToC) and uses an LLM to reason over that tree to find the answer. Basically simulates how a human expert actually navigates a document.
Their Mafin 2.5 system hit 98.7% on FinanceBench using this approach, which is well above typical vector RAG numbers on the same benchmark.
The failure modes I kept running into with vector RAG:
- Hard chunking destroys document structure
- Cross-references like "see Appendix G" are completely invisible to the retriever
- Each query is stateless no memory across turns
- No audit trail you just get cosine scores with no explanation of why
Curious if others have hit the same issues and what your workarounds have been. Also interested in whether anyone's benchmarked PageIndex against hybrid approaches (BM25 + vector, for example).
Full writeup with diagrams if anyone wants the deeper dive: https://medium.com/data-science-collective/your-rag-system-isnt-retrieving-it-s-guessing-809dd8f378df
[link] [comments]




