Information Theory and Statistical Learning

arXiv stat.ML / 5/6/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The preprint is a chapter draft for the upcoming third edition of Cover and Thomas’s *Elements of Information Theory*, bridging learning and information theory in both training and performance-limit perspectives.
  • It concentrates on how divergence measures drive model training, covering examples from classical regression through modern generative modeling methods.
  • The chapter introduces key concepts including the evidence lower bound (ELBO), f-divergences, and the Fisher divergence to connect statistical learning objectives with information-theoretic quantities.
  • It provides a notably systematic and explicit derivation for generative diffusion models, aiming to be clearer than typical treatments in the literature.
  • The material is designed to be accessible for advanced undergraduates or first-year graduate students and includes end-of-chapter exercises for classroom or self-study use.

Abstract

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), f\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

Information Theory and Statistical Learning | AI Navigate