Leakage and Interpretability in Concept-Based Models
arXiv stat.ML / 3/25/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Concept-Based Models are proposed to improve interpretability by predicting intermediate human-understandable concepts, but they can fail due to information leakage embedded in the learned concept representations.
- The paper introduces an information-theoretic framework that defines two quantitative metrics—concepts-task leakage (CTL) and interconcept leakage (ICL)—to rigorously characterize and measure leakage.
- The CTL and ICL scores are shown to strongly predict how models will behave under interventions and to outperform existing leakage-related measures.
- The authors identify main causes of leakage and, in a case study of Concept Embedding Models, find additional leakage modes beyond the leakage that is already present by design (including interconcept and alignment leakage).
- The paper concludes with practical design guidelines intended to reduce leakage and maintain interpretability in concept-based model architectures.
Related Articles
Santa Augmentcode Intent Ep.6
Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’
Reddit r/artificial
Scaffolded Test-First Prompting: Get Correct Code From the First Run
Dev.to