PRISM: Demystifying Retention and Interaction in Mid-Training
arXiv cs.LG / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- PRISM is an empirical study of mid-training design choices for large language models, conducting controlled experiments across seven base models from four families, two architectures, and scales from 3B to 24B parameters.
- Mid-training on roughly 27B high‑quality tokens yields consistent gains on math (+15 to +40 points), code (+5 to +12), and science (+6 to +13) benchmarks while preserving general performance.
- When RL is applied through the full PRISM pipeline, macro-average reasoning scores rise from under 12 to 29–42, whereas applying RL directly to base models is much less effective; data composition during mid-training—especially including science data—drives these gains.
- Mechanistically, mid-training densely reconfigures over 90% of model weights, RL refinements affect about 5% of parameters, RL preserves mid-training representational geometry (CKA > 0.998), and RL only succeeds on mid-trained models, underscoring the value of retention-aware mid-training for reliable reasoning enhancement.
Related Articles

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**
Qiita

Complete Guide: How To Make Money With Ai
Dev.to

Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again
Dev.to

How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses
Dev.to