Wrote up why vector RAG keeps failing on complex documents and found a project doing retrieval without embeddings at all

Reddit r/LocalLLaMA / 3/14/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

RAG retrieval often fails on complex documents because similarity is not the same as relevance, so the top-scoring chunk may not contain the actual answer, especially when the answer lies in an appendix or requires understanding document structure.
PageIndex from VectifyAI takes a different approach with no embeddings, no vector DB, and no chunking; it builds a hierarchical tree index and uses an LLM to reason over the tree to find answers, simulating human document navigation.
The approach reportedly achieves 98.7% on FinanceBench with Mafin 2.5, higher than typical vector RAG performance on the same benchmark.
The post lists common vector RAG failure modes (hard chunking destroying structure, invisible cross-references, stateless queries, no audit trail) and invites discussion on experiences and benchmarks comparing PageIndex to hybrid methods like BM25 + vector.

Been building RAG pipelines for a while and kept hitting the same wall: the retrieval works fine on simple documents but falls apart the moment you throw a dense financial report, legal filing, or technical manual at it.

Spent some time digging into why and it basically comes down to one thing — similarity is not the same as relevance. The chunk that scores highest cosine similarity to your query is often not the chunk that actually answers it. Especially when the answer lives in an appendix that a cross-reference points to, or requires understanding the document's structure rather than just matching surface text.

Came across PageIndex (github.com/VectifyAI/PageIndex, 21.5k stars) which takes a completely different approach no embeddings, no vector DB, no chunking. Instead it builds a hierarchical tree index of the document (like a rich ToC) and uses an LLM to reason over that tree to find the answer. Basically simulates how a human expert actually navigates a document.

Their Mafin 2.5 system hit 98.7% on FinanceBench using this approach, which is well above typical vector RAG numbers on the same benchmark.

The failure modes I kept running into with vector RAG:

Hard chunking destroys document structure
Cross-references like "see Appendix G" are completely invisible to the retriever
Each query is stateless no memory across turns
No audit trail you just get cosine scores with no explanation of why

Curious if others have hit the same issues and what your workarounds have been. Also interested in whether anyone's benchmarked PageIndex against hybrid approaches (BM25 + vector, for example).

Full writeup with diagrams if anyone wants the deeper dive: https://medium.com/data-science-collective/your-rag-system-isnt-retrieving-it-s-guessing-809dd8f378df

submitted by /u/shreyanshjain05
[link] [comments]