Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

arXiv cs.LG / 4/29/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Incompressible Knowledge Probes (IKPs), a 1,400-question factual benchmark meant to infer lower bounds on a closed-box LLM’s parameter count from how much factual knowledge it can reliably produce.
It calibrates a log-linear relationship between IKP accuracy and parameter counts using 89 open-weight models (135M–1,600B parameters) across 19 vendors, achieving strong fit (R²=0.917) and good cross-validation generalization.
For Mixture-of-Experts (MoE) models, total parameters correlate with factual knowledge substantially better than active parameters, suggesting the probe is sensitive to overall capacity rather than just routing-time computation.
The authors analyze 188 models across 27 vendors to estimate “effective knowledge capacity” of major proprietary frontier models, noting that safety/refusal behavior can cause the estimates to be conservative lower bounds.
Contrary to claims of scaling saturation, the paper finds factual capacity continues to scale log-linearly with parameters across model generations and vendors, with the IKP time coefficient near zero for open-weight models.

Abstract

Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries

2\times

+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exploit a tighter intrinsic bound: storing

F

facts requires at least

F/

(bits per parameter) weights, so measuring how much a model \emph{knows} lower-bounds how many parameters it \emph{has}. We introduce \textbf{Incompressible Knowledge Probes (IKPs)}, a benchmark of 1{,}400 factual questions spanning 7 tiers of obscurity, designed to isolate knowledge that cannot be derived by reasoning or compressed by architectural improvements. We calibrate a log-linear mapping from IKP accuracy to parameter count on 89 open-weight models (135M--1,600B) spanning 19 vendors, achieving

R^2 = 0.917

; leave-one-out cross-validation confirms generalization (median fold error

1.59\times

68.5\%

within

2\times

and

87.6\%

within

3\times

). For Mixture-of-Experts models, total parameters predict knowledge (

R^2 = 0.79

) far better than active parameters (

R^2 = 0.51

). We evaluate 188 models from 27 vendors and estimate effective knowledge capacity for all major proprietary frontier models; for heavily safety-tuned models the estimates are lower bounds, since refusal policy can hide tens of percentage points of "refused but known" capacity. The widely-reported saturation of reasoning benchmarks does not imply the end of scaling. Procedural capability compresses under the "Densing Law," but across 96 dated open-weight models the IKP time coefficient is

-0.0010

/month (95\% CI

[-0.0031, +0.0008]

) -- indistinguishable from zero, and rejecting the Densing prediction of

+0.0117

/month at

p < 10^{-15}

. Factual capacity continues to scale log-linearly with parameters across generations and across vendors.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dev.to

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer