I built a Model Context Protocol (MCP) index of 3 Million arXiv papers for LLMs. [D]

Reddit r/MachineLearning / 5/19/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author has built a Model Context Protocol (MCP) index covering roughly 3 million arXiv papers to help connect local and cloud LLMs to a large ML/STEM corpus.
The stated goal is to reduce hallucinated citations and improve research workflows by improving how LLMs retrieve relevant literature.
The index is already live, but the author is seeking validation by stress-testing retrieval quality against highly niche and complex queries, including obscure math and hyper-specific domains.
They are inviting a small group of about 20 users to try the system, attempt to break it, and provide blunt feedback on the relevance of retrieved papers.
Interested users can contact the author to get connection details for testing with their own LLM setups and daily research queries.

Hey everyone,

I recently finished building a Model Context Protocol (MCP) index containing roughly 3 million arXiv papers. My goal was to make it easier to connect local and cloud LLMs directly to a massive corpus of ML and STEM research to help reduce hallucinated citations and improve research workflows.

The index is live, but before I open it up broadly, I want to make sure the retrieval quality actually holds up against highly niche, complex queries (especially for obscure math, hyper-specific domains, or newer architectures).

I’m looking for a small group of folks (around 20) to try it out, try to break the retrieval system, and give me brutal feedback on the relevance of the fetched papers.

If you want to stress-test it with your own LLM setup and see how it performs with your daily research queries, let me know in the comments or shoot me a DM and I’ll send you the connection details!

Thanks!

submitted by /u/Divyansh3021
[link] [comments]