Barriers to Counterfactual Credit Attribution for Autoregressive Models

arXiv cs.LG / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper revisits counterfactual credit attribution (CCA) as a technical framework for determining which prior work a generative model’s output significantly depends on.
It studies CCA for autoregressive generative models that need to attribute credit to a deployment-time dataset such as a RAG database.
The authors show a key limitation: enforcing CCA on the underlying next-token predictor does not ensure the full autoregressive model satisfies CCA, because CCA does not compose autoregressively (unlike differential privacy).
They propose an alternative “retrofitting” method that adds credit after the fact, but prove that under a weak optimality condition, black-box retrofitting needs exponentially many queries relative to the output length.
Overall, the work identifies fundamental barriers to making practical CCA-style attribution workable for autoregressive systems.

Abstract

Generative AI disrupts the practice of giving credit to work that came before. Ideally, a generative model would give credit to any work on which its output depends in a significant way. \emph{Counterfactual credit attribution} (CCA) is a technical condition formalizing this goal--a relaxation of differential privacy--recently introduced by Livni, Moran, Nissim, and Pabbaraju [2024] who studied it in the PAC learning setting. We initiate the study of CCA generative models. Specifically, we consider autoregressive models giving credit to a deployment-time dataset (e.g., a RAG database). We uncover barriers to two natural approaches to CCA autoregressive models. First, we show that imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). Second, we consider a different approach to building CCA models which we call \emph{retrofitting}. Retrofitting takes a model that does not attribute credit, and adds credit onto it. We prove a lower bound for CCA retrofitting under a weak optimality requirement. Given black-box access to the starting model, retrofitting requires query complexity exponential in the length of the model's outputs.

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF

Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026

Dev.to

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part

Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry

Dev.to

Barriers to Counterfactual Credit Attribution for Autoregressive Models

Key Points

Abstract

Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF

10 Ways AI Has Become Your Invisible Daily Companion in 2026

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer