Learn by Surprise, Commit by Proof

arXiv cs.LG / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes LSCP, a self-gated post-training method that only triggers learning when per-token loss is anomalously high, indicating the model may lack the information.
  • LSCP uses an internal Q&A chain to force self-verification and articulate knowledge gaps without relying on an external oracle, and it adjusts optimizer behavior via AdamW β2 scaled by “conviction depth.”
  • A single parameter r controls learning intensity, and the approach is described as self-extinguishing: as the model improves on learned passages, it converges toward standard AdamW rather than continuing heavy updates.
  • Experiments on Qwen3-14B and six model families (8B–32B) report that standard fine-tuning leads to rote memorization, while LSCP conditions learn more semantically, and gating helps protect neighboring knowledge.
  • The authors frame the mechanism as computational analogues of biological memory consolidation, turning temporary contextual information into more stable parametric knowledge over training.

Abstract

We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, generates a Q&A chain that forces the model to articulate its own knowledge and identify gaps, then adjusts AdamW's \beta_2 proportionally to conviction depth k (the number of self-verification steps the passage survives) via \beta_2 = 0.999 \cdot r^k. The entire learning intensity is governed by a single parameter r. Beyond new knowledge, this process sharpens weakly encoded existing knowledge, which is a primary source of hallucination. The framework is self-extinguishing: as the model learns, per-token loss on learned passages decreases toward the surprisal threshold and the system progressively converges to standard AdamW. This models biological memory consolidation: temporary information in the context window is selectively consolidated into parametric weights, the model's long-term memory. Experiments on the reference model (Qwen3-14B) and across six models (8B--32B, four families) show that standard fine-tuning produces rote memorization (perturbation gap (the ratio of paraphrase to original perplexity) of 11.6 +- 0.2 x baseline) while all LSCP conditions learn semantically (2.7--3.0x). The r=1.0 condition (identical optimizer, nearly identical data, only Q&A format differs) confirms that the training data format, not \beta_2 gating, is the primary mechanism preventing memorization; gating instead protects neighboring knowledge from contamination by corrupt content (93 +- 7% accuracy on adjacent questions at r=0.98 vs. 90% baseline).