Quantisation Reshapes the Metacognitive Geometry of Language Models

arXiv cs.CL / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper finds that quantization changes LLM “metacognitive efficiency” by restructuring domain-level M-ratio behavior rather than uniformly degrading it.
  • In experiments with Llama-3-8B-Instruct on 3,000 questions, M-ratio profiles across four knowledge domains are uncorrelated between Q5_K_M and f16 (Spearman rho = 0.00), with some domains improving while others worsen after quantization.
  • Type-2 AUROC profiles remain perfectly stable across formats (rho = 1.00), suggesting the effect is mainly in M-ratio normalization/confidence calibration rather than the underlying discrimination signal.
  • A pre-registered attempt to improve metacognition via domain-conditional confidence-amplification SFT did not generalize: confirmatory hypotheses were null, and meta-d’ did not improve because the diagnostic profile didn’t transfer across quantization formats.
  • The authors release code, pre-registrations, and trial-level data and warn that systems relying on domain-level M-ratio profiles may have an unexamined dependency on inference format, while AUROC_2 may be safer.

Abstract

We report that model quantisation restructures domain-level metacognitive efficiency in LLMs rather than degrading it uniformly. Evaluating Llama-3-8B-Instruct on the same 3,000 questions at Q5_K_M and f16 precision, we find that M-ratio profiles across four knowledge domains are uncorrelated between formats (Spearman rho = 0.00). Arts & Literature moves from worst-monitored (M-ratio = 0.606 at Q5_K_M) to best-monitored (1.542 at f16). Geography moves from well-monitored (1.210) to under-monitored (0.798). However, Type-2 AUROC profiles are perfectly stable across formats (rho = 1.00), localising the restructuring to the M-ratio normalisation rather than the underlying discrimination signal. This finding emerged from a pre-registered attempt to improve metacognition through domain-conditional training. We prescribed confidence-amplification SFT for the diagnosed weak domain, with matched-budget agnostic and wrong-prescription controls. All four confirmatory hypotheses were null (10,000 bootstrap resamples, seed = 42). The training successfully reshaped confidence distributions, doubling the NLP gap in Science from 0.076 to 0.152, but did not improve meta-d' because the diagnostic profile did not transfer across formats. Any system relying on domain-level M-ratio profiles has an unexamined dependency on inference format. Systems using AUROC_2 are safer. We release all code, pre-registrations, and trial-level data.