The Manokhin Probability Matrix: A Diagnostic Framework for Classifier Probability Quality
arXiv stat.ML / 5/6/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The Manokhin Probability Matrix is a new diagnostic framework that splits classifier probability quality into two components—reliability (calibration) and resolution (discrimination)—addressing the limitation of the single-number Brier score.
- Classifier performance is mapped onto a 2x2 grid using the Spiegelhalter Z-statistic (calibration) and an AUC-ROC expected rank, producing four actionable archetypes: Eagle, Bull, Sloth, and Mole.
- The study of 21 classifiers, 5 post-hoc calibrators, and 30 TabArena-v0.1 binary tasks assigns clear archetypes: CatBoost/TabICL/EBM/TabPFN/GBC/Random Forest as Eagles; XGBoost/LightGBM/HGB as Bulls; SVM/LR/LDA/base-rate predictor as Sloths; and MLP/KNN/Naive Bayes/ExtraTrees as Moles.
- Results show that calibration methods can improve log-loss for Bulls (6.5%–12.6%) but may slightly harm Eagles (−2.1%), and a theory result (Proposition 1) states that order-preserving post-hoc calibration cannot increase discriminatory power.
- The recommended practice is to decompose Brier score before optimization: optimize for discrimination first, then apply post-hoc calibration to correct reliability, with code and experimental data released on GitHub.
Related Articles

Black Hat USA
AI Business

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

SIFS (SIFS Is Fast Search) - local code search for coding agents
Dev.to