Internal Knowledge Without External Expression: Probing the Generalization Boundary of a Classical Chinese Language Model
arXiv cs.CL / 4/17/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper trains a 318M-parameter Transformer from scratch on 1.56B tokens of Classical Chinese only (no English characters or Arabic numerals) and evaluates it with systematic out-of-distribution tests about known vs. unknown historical events.
- Results show a strong split between internal and external uncertainty: the model’s perplexity increases markedly for fabricated and semi-fabricated events, suggesting it encodes factual information, but it fails to reliably express that uncertainty in its generated text.
- Across multiple languages/writing systems and eight model sizes (110M–1.56B), the ability to *express* epistemic uncertainty is driven by training-data rhetorical conventions rather than genuine metacognition.
- The authors introduce a “humility paradox” for Classical Chinese models (more hedging on known topics) and contrast it with Japanese models that almost never hedge, arguing that metacognitive “I don’t know” behavior needs explicit training signals such as RLHF.
- The study concludes that language-model generalization can contain meaningful internal uncertainty while remaining outwardly uncalibrated, highlighting limits of classical language modeling for reliable uncertainty communication.



