The Query Channel: Information-Theoretic Limits of Masking-Based Explanations

arXiv cs.AI / 4/21/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper reinterprets masking-based post-hoc explanation methods (e.g., KernelSHAP, LIME) as a communication problem over a “query channel,” where each masked model evaluation is treated like a channel use.
It characterizes the complexity of an explanation through the entropy of the hypothesis class and defines a per-query identification capacity that limits how much information each query can deliver.
A strong converse result shows that when the required explanation recovery rate exceeds this capacity, exact recovery becomes impossible: the probability of error goes to one regardless of the explainer/decoder sequence.
The authors also provide an achievability theorem, proving that under rates below capacity, reliable exact recovery is possible using a sparse maximum-likelihood decoder.
Experiments and benchmarks (including a Monte Carlo mutual-information estimator) show information-theoretic conditions where explanations are theoretically feasible while common convex surrogates can still fail, and they analyze how resolution/tokenization choices and noise degrade the “channel.”

Abstract

Masking-based post-hoc explanation methods, such as KernelSHAP and LIME, estimate local feature importance by querying a black-box model under randomized perturbations. This paper formulates this procedure as communication over a query channel, where the latent explanation acts as a message and each masked evaluation is a channel use. Within this framework, the complexity of the explanation is captured by the entropy of the hypothesis class, while the query interface supplies information at a rate determined by an identification capacity per query. We derive a strong converse showing that, if the explanation rate exceeds this capacity, the probability of exact recovery necessarily converges to one in error for any sequence of explainers and decoders. We also prove an achievability result establishing that a sparse maximum-likelihood decoder attains reliable recovery when the rate lies below capacity. A Monte Carlo estimator of mutual information yields a non-asymptotic query benchmark that we use to compare optimal decoding with Lasso- and OLS-based procedures that mirror LIME and KernelSHAP. Experiments reveal a range of query budgets where information theory permits reliable explanations but standard convex surrogates still fail. Finally, we interpret super-pixel resolution and tokenization for neural language models as a source-coding choice that sets the entropy of the explanation and show how Gaussian noise and nonlinear curvature degrade the query channel, induce waterfall and error-floor behavior, and render high-resolution explanations unattainable.

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.

Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs

The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle

Dev.to

DEEPX and Hyundai Are Building Generative AI Robots

Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline

Dev.to

The Query Channel: Information-Theoretic Limits of Masking-Based Explanations

Key Points

Abstract

Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle

DEEPX and Hyundai Are Building Generative AI Robots

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer