Towards Reasonable Concept Bottleneck Models

arXiv stat.ML / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a flexible framework for Concept Bottleneck Models (CBMs) that explicitly encodes practitioners’ prior beliefs about concept–concept (C–C) and concept–task (C→Y) relationships into the model’s reasoning for predictions.
It introduces CREAM (Concept REAsoning Models), which can represent arbitrary C–C structures (e.g., mutual exclusivity, hierarchies, correlations) and optionally sparse C→Y links to improve concept-grounded reasoning.
The method can include a regularized side-channel to compensate for incomplete concept sets while maintaining competitive task performance and encouraging predictions to be grounded in concepts.
To assess interpretability under partial reliance on the side-channel, the authors define a C→Y-agnostic metric for interpretability when predictions depend partly on the side information.
Experiments indicate CREAM can enable efficient interventions, reduce “concept leakage,” and match black-box-level performance even when concepts are missing, with added analysis of how the side-channel impacts interpretability and intervenability.

Abstract

We propose a novel, flexible, and efficient framework for designing Concept Bottleneck Models (CBMs) that enables practitioners to explicitly encode and extend their prior knowledge and beliefs about the concept-concept (

C-C

) and concept-task (

C \to Y

) relationships within the model's reasoning when making predictions. The resulting

\textbf{C}

oncept

\textbf{REA}

soning

\textbf{M}

odels (CREAMs) architecturally encode arbitrary types of

C-C

relationships such as mutual exclusivity, hierarchical associations, and/or correlations, as well as potentially sparse

C \to Y

relationships. Moreover, CREAM can optionally incorporate a regularized side-channel to complement the potentially {incomplete concept sets}, achieving competitive task performance while encouraging predictions to be concept-grounded. To evaluate CBMs in such settings, we introduce a

C \to Y

agnostic metric that quantifies interpretability when predictions partially rely on the side-channel. In our experiments, we show that, without additional computational overhead, CREAM models support efficient interventions, can avoid concept leakage, and achieve black-box-level performance under missing concepts. We further analyze how an optional side-channel affects interpretability and intervenability. Importantly, the side-channel enables CBMs to remain effective even in scenarios where only a limited number of concepts are available.