A Model of Understanding in Deep Learning Systems

arXiv cs.AI / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a framework for “systematic understanding” in machine learning agents, requiring an internal model, stable bridge principles to the target system, and reliable predictive capability.
  • It argues that many contemporary deep learning systems can achieve forms of understanding, but do not meet the stronger ideal of scientific understanding.
  • The work introduces the “Fractured Understanding Hypothesis,” claiming deep learning understanding is often symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying.
  • Overall, it provides an evaluative lens for when ML systems’ internal representations constitute genuine understanding versus partial or misaligned modeling.

Abstract

I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.