Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation
arXiv cs.CL / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates uncertainty quantification for black-box LLM queries where only a small number of responses can be sampled, making accurate risk estimation difficult.
- It uses the “effective semantic alphabet size” (the number of distinct meanings in sampled responses) as a proxy for downstream hallucination risk.
- Frequency-only estimators can miss rare semantic modes with small samples, and graph-spectral measures alone cannot reliably estimate semantic occupancy.
- The authors propose SHADE, which fuses Generalized Good-Turing coverage with a heat-kernel trace from an entailment-weighted graph over sampled responses, using adaptive fusion rules based on estimated coverage.
- Experiments on semantic alphabet-size estimation and QA incorrectness detection show SHADE provides the largest gains in the most sample-limited settings, with improvements diminishing as sample counts grow.
