Improving Latent Generalization Using Test-time Compute
arXiv cs.LG / 2026/4/3
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper argues that failures of deductive reasoning in language models stem from weak “latent generalization” in weight-based knowledge acquisition (in-weights learning).
- It proposes improving latent generalization without task-specific train-time augmentation by training models to use test-time “thinking” via long chain-of-thought (CoT) generated under reinforcement learning from correctness feedback.
- Experiments show the test-time thinking approach fixes many latent generalization failures on in-distribution knowledge and, unlike augmentation baselines, can generalize to new out-of-distribution knowledge where no RL was performed.
- On pure reversal tasks, the method does not directly enable knowledge inversion, but it improves performance by leveraging a generate-and-verify style that boosts results above chance.
- The authors find that factual self-verification remains brittle, leaving thinking models below in-context learning performance on reversal tasks, while still positioning test-time thinking as a flexible direction for better generalization.




