LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

arXiv cs.CL / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

LLMpedia is presented as a framework that materializes an LLM’s encyclopedic knowledge at scale purely from parametric memory, without retrieval, generating about 1M articles across three model families.
The authors argue that benchmark scores (e.g., MMLU-style factuality saturation above 90%) paint an incomplete picture, reporting lower verifiable true rates on Wikipedia-covered subjects (74.7% for gpt-5-mini) and even lower rates on frontier subjects verified via curated web evidence (63.2%).
The study highlights evaluation limitations such as availability bias and subject coverage constraints, noting Wikipedia covers only 61% of surfaced subjects and overlap in subject choice across three model families is just 7.3%.
Using a “capture-trap” benchmark inspired by Grokipedia, LLMpedia shows substantially higher factuality while achieving roughly half the textual similarity to Wikipedia.
The work emphasizes transparency by publicly releasing prompts, artifacts, and evaluation verdicts, claiming it as the first fully open parametric encyclopedia, with data/code/interface available at llmpedia.net.

Abstract

Benchmarks such as MMLU suggest flagship language models approach factuality saturation, with scores above 90\%. We show this picture is incomplete. \emph{LLMpedia} generates encyclopedic articles entirely from parametric memory, producing

{\sim}

1M articles across three model families without retrieval. For gpt-5-mini, the verifiable true rate on Wikipedia-covered subjects is only 74.7\% -- more than 15 percentage points below the benchmark-based picture, consistent with the availability bias of fixed-question evaluation. Beyond Wikipedia, frontier subjects verifiable only through curated web evidence fall further to 63.2\% true rate. Wikipedia covers just 61\% of surfaced subjects, and three model families overlap by only 7.3\% in subject choice. In a capture-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia achieves substantially higher factuality at roughly half the textual similarity to Wikipedia. Unlike Grokipedia, every prompt, artifact, and evaluation verdict is publicly released, making LLMpedia the first fully open parametric encyclopedia -- bridging factuality evaluation and knowledge materialization. All data, code, and a browsable interface are at https://llmpedia.net.

5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)

Dev.to

AgentDesk vs Hiring Another Consultant: A Cost Comparison

Dev.to

"Why Your AI Agent Needs a System 1"

Dev.to

When should we expect TurboQuant?

Reddit r/LocalLLaMA

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

Dev.to

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

Key Points

Abstract

Related Articles

5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)

AgentDesk vs Hiring Another Consultant: A Cost Comparison

"Why Your AI Agent Needs a System 1"

When should we expect TurboQuant?

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer