Towards Reasonable Concept Bottleneck Models

arXiv stat.ML / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a flexible framework for Concept Bottleneck Models (CBMs) that explicitly encodes practitioners’ prior beliefs about concept–concept (C–C) and concept–task (C→Y) relationships into the model’s reasoning for predictions.
  • It introduces CREAM (Concept REAsoning Models), which can represent arbitrary C–C structures (e.g., mutual exclusivity, hierarchies, correlations) and optionally sparse C→Y links to improve concept-grounded reasoning.
  • The method can include a regularized side-channel to compensate for incomplete concept sets while maintaining competitive task performance and encouraging predictions to be grounded in concepts.
  • To assess interpretability under partial reliance on the side-channel, the authors define a C→Y-agnostic metric for interpretability when predictions depend partly on the side information.
  • Experiments indicate CREAM can enable efficient interventions, reduce “concept leakage,” and match black-box-level performance even when concepts are missing, with added analysis of how the side-channel impacts interpretability and intervenability.

Abstract

We propose a novel, flexible, and efficient framework for designing Concept Bottleneck Models (CBMs) that enables practitioners to explicitly encode and extend their prior knowledge and beliefs about the concept-concept (C-C) and concept-task (C \to Y) relationships within the model's reasoning when making predictions. The resulting \textbf{C}oncept \textbf{REA}soning \textbf{M}odels (CREAMs) architecturally encode arbitrary types of C-C relationships such as mutual exclusivity, hierarchical associations, and/or correlations, as well as potentially sparse C \to Y relationships. Moreover, CREAM can optionally incorporate a regularized side-channel to complement the potentially {incomplete concept sets}, achieving competitive task performance while encouraging predictions to be concept-grounded. To evaluate CBMs in such settings, we introduce a C \to Y agnostic metric that quantifies interpretability when predictions partially rely on the side-channel. In our experiments, we show that, without additional computational overhead, CREAM models support efficient interventions, can avoid concept leakage, and achieve black-box-level performance under missing concepts. We further analyze how an optional side-channel affects interpretability and intervenability. Importantly, the side-channel enables CBMs to remain effective even in scenarios where only a limited number of concepts are available.