| Recent benchmarks, specifically regarding the AA-Omniscience Hallucination Rate, suggest a counter-intuitive trend. While larger models in the Qwen 3.5 family (9B and 397B) show hallucination rates exceeding 80% in "all-knowing" tests, the Qwen 3.5 0.8B variant demonstrates a significantly lower rate of approximately 37%. For those using AnythingLLM, have you found that the 0.8B parameter scale provides better "faithfulness" to the retrieved embeddings compared to larger models? [link] [comments] |
Is Qwen 3.5 0.8B the optimal choice for local RAG implementations in 2026?
Reddit r/LocalLLaMA / 3/20/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage
Key Points
- Recent benchmarks indicate the Qwen 3.5 0.8B model has a lower AA-Omniscience Hallucination Rate of about 37%, versus larger Qwen 3.5 variants that exceed 80% in all-knowing tests.
- In AnythingLLM-based RAG workflows, the 0.8B variant may offer better faithfulness to retrieved embeddings than larger models.
- This challenges the assumption that bigger models always excel at knowledge-intensive tasks, showing larger models can be more prone to hallucinations.
- For local RAG deployments in 2026, smaller 0.8B-scale models could be a preferable default depending on use-case, resources, and latency constraints.
- The post by user koloved linking to benchmarks signals active, ongoing evaluation in the local-LLaMA community.




