A Model of Understanding in Deep Learning Systems

arXiv cs.AI / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a framework for “systematic understanding” in machine learning agents, requiring an internal model, stable bridge principles to the target system, and reliable predictive capability.
It argues that many contemporary deep learning systems can achieve forms of understanding, but do not meet the stronger ideal of scientific understanding.
The work introduces the “Fractured Understanding Hypothesis,” claiming deep learning understanding is often symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying.
Overall, it provides an evaluative lens for when ML systems’ internal representations constitute genuine understanding versus partial or misaligned modeling.

Abstract

I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.