BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering
arXiv cs.CL / 4/27/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper argues that standard RAG for question answering—by concatenating all retrieved documents into one context—makes individual-document contributions hard to trace and worsens the “lost-in-the-middle” problem, especially with long contexts and visual data.
- It introduces BERAG, a Bayesian Ensemble RAG framework that conditions a language model on each retrieved document separately and uses Bayesian posterior probabilities as ensemble weights, updated token-by-token during generation.
- The method supports probabilistic re-ranking, parallel memory usage, and clearer attribution of how each document influenced the final answer, which is advantageous for large document collections.
- Experiments on knowledge-based visual question answering show substantial improvements over standard RAG, including gains on Document VQA and multimodal “needle-in-a-haystack” benchmarks, and BERAG is shown to mitigate lost-in-the-middle.
- The authors also report practical mechanisms: using document posteriors to detect insufficient grounding and trigger “deflection,” and using document pruning to speed up decoding versus standard RAG.




