Learning is Forgetting: LLM Training As Lossy Compression
arXiv cs.CL / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes viewing large language models as performing lossy compression: as training proceeds, they retain only information from the training data that is relevant to their objectives.
- It argues that representations learned during pre-training can be interpreted through an information-theoretic lens, with results approaching the Information Bottleneck bound for compression in next-sequence prediction.
- Experiments across multiple open-weight LLMs show that different model families compress their knowledge differently, likely reflecting variations in data and training recipes.
- The authors claim the degree of compression/optimality of a model correlates with the amount of information captured and can predict downstream performance across many benchmarks, linking representational structure to practical outcomes.
- The work provides a unified information-theoretic framing intended to be usable at scale for understanding how LLMs learn and for deriving actionable insights about model performance.
Related Articles

Black Hat Asia
AI Business

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works
Dev.to
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial