LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications

Reddit r/LocalLLaMA / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • LLMSearchIndex is an open-source Python library that provides fully local, internet-scale web search tailored for LLM/RAG systems.
  • The project uses a custom, highly compressed search index built from most pages of FineWeb plus Wikipedia, with the full index size at around 2GB.
  • It is designed to run on most local hardware while delivering fast retrieval speeds to supply relevant context for RAG.
  • The library includes a simple API for querying and retrieving top-k results, and there is also an online demo for trying it out.
  • The author positions it as an alternative to paid search APIs and meta-search scrapers like SearXNG for local deployments.
LLMSearchIndex- an Open Source Local Web Search Library with over 200 million indexed Web Pages for RAG applications

I've been pretty unsatisfied with web search options for local LLM/RAG systems. Most setups either rely on paid APIs like Brave, or meta search scrapers like SearXNG.

So I built LLMSearchIndex- a Python library for fully local internet-scale search. It uses a custom trained, highly compressed search index that contains most of the webpages from FineWeb + Wikipedia. The full index is only ~2GB and runs locally on most hardware with pretty fast retrieval speeds.

I've built a python library to make it easy to retrieve these results for RAG context.

from llmsearchindex import LLMIndex index = LLMIndex() results = index.search("who invented sliced bread?", top_k=5) 

You can also check out a demo here: https://zakerytclarke-llmsearchindex.hf.space/

submitted by /u/zakerytclarke
[link] [comments]