PRISM: Demystifying Retention and Interaction in Mid-Training
arXiv cs.LG / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- PRISM is an empirical study of mid-training design choices for large language models, conducting controlled experiments across seven base models from four families, two architectures, and scales from 3B to 24B parameters.
- Mid-training on roughly 27B high‑quality tokens yields consistent gains on math (+15 to +40 points), code (+5 to +12), and science (+6 to +13) benchmarks while preserving general performance.
- When RL is applied through the full PRISM pipeline, macro-average reasoning scores rise from under 12 to 29–42, whereas applying RL directly to base models is much less effective; data composition during mid-training—especially including science data—drives these gains.
- Mechanistically, mid-training densely reconfigures over 90% of model weights, RL refinements affect about 5% of parameters, RL preserves mid-training representational geometry (CKA > 0.998), and RL only succeeds on mid-trained models, underscoring the value of retention-aware mid-training for reliable reasoning enhancement.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to