| I've been pretty unsatisfied with web search options for local LLM/RAG systems. Most setups either rely on paid APIs like Brave, or meta search scrapers like SearXNG. So I built LLMSearchIndex- a Python library for fully local internet-scale search. It uses a custom trained, highly compressed search index that contains most of the webpages from FineWeb + Wikipedia. The full index is only ~2GB and runs locally on most hardware with pretty fast retrieval speeds. I've built a python library to make it easy to retrieve these results for RAG context. You can also check out a demo here: https://zakerytclarke-llmsearchindex.hf.space/ [link] [comments] |
LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications
Reddit r/LocalLLaMA / 5/4/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- LLMSearchIndex is an open-source Python library that provides fully local, internet-scale web search tailored for LLM/RAG systems.
- The project uses a custom, highly compressed search index built from most pages of FineWeb plus Wikipedia, with the full index size at around 2GB.
- It is designed to run on most local hardware while delivering fast retrieval speeds to supply relevant context for RAG.
- The library includes a simple API for querying and retrieving top-k results, and there is also an online demo for trying it out.
- The author positions it as an alternative to paid search APIs and meta-search scrapers like SearXNG for local deployments.



