I built a Model Context Protocol (MCP) index of 3 Million arXiv papers for LLMs. [D]

Reddit r/MachineLearning / 5/19/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The author has built a Model Context Protocol (MCP) index covering roughly 3 million arXiv papers to help connect local and cloud LLMs to a large ML/STEM corpus.
  • The stated goal is to reduce hallucinated citations and improve research workflows by improving how LLMs retrieve relevant literature.
  • The index is already live, but the author is seeking validation by stress-testing retrieval quality against highly niche and complex queries, including obscure math and hyper-specific domains.
  • They are inviting a small group of about 20 users to try the system, attempt to break it, and provide blunt feedback on the relevance of retrieved papers.
  • Interested users can contact the author to get connection details for testing with their own LLM setups and daily research queries.

Hey everyone,

​I recently finished building a Model Context Protocol (MCP) index containing roughly 3 million arXiv papers. My goal was to make it easier to connect local and cloud LLMs directly to a massive corpus of ML and STEM research to help reduce hallucinated citations and improve research workflows.

​The index is live, but before I open it up broadly, I want to make sure the retrieval quality actually holds up against highly niche, complex queries (especially for obscure math, hyper-specific domains, or newer architectures).

​I’m looking for a small group of folks (around 20) to try it out, try to break the retrieval system, and give me brutal feedback on the relevance of the fetched papers.

​If you want to stress-test it with your own LLM setup and see how it performs with your daily research queries, let me know in the comments or shoot me a DM and I’ll send you the connection details!

Thanks!

submitted by /u/Divyansh3021
[link] [comments]