Barriers to Counterfactual Credit Attribution for Autoregressive Models

arXiv cs.LG / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper revisits counterfactual credit attribution (CCA) as a technical framework for determining which prior work a generative model’s output significantly depends on.
  • It studies CCA for autoregressive generative models that need to attribute credit to a deployment-time dataset such as a RAG database.
  • The authors show a key limitation: enforcing CCA on the underlying next-token predictor does not ensure the full autoregressive model satisfies CCA, because CCA does not compose autoregressively (unlike differential privacy).
  • They propose an alternative “retrofitting” method that adds credit after the fact, but prove that under a weak optimality condition, black-box retrofitting needs exponentially many queries relative to the output length.
  • Overall, the work identifies fundamental barriers to making practical CCA-style attribution workable for autoregressive systems.

Abstract

Generative AI disrupts the practice of giving credit to work that came before. Ideally, a generative model would give credit to any work on which its output depends in a significant way. \emph{Counterfactual credit attribution} (CCA) is a technical condition formalizing this goal--a relaxation of differential privacy--recently introduced by Livni, Moran, Nissim, and Pabbaraju [2024] who studied it in the PAC learning setting. We initiate the study of CCA generative models. Specifically, we consider autoregressive models giving credit to a deployment-time dataset (e.g., a RAG database). We uncover barriers to two natural approaches to CCA autoregressive models. First, we show that imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). Second, we consider a different approach to building CCA models which we call \emph{retrofitting}. Retrofitting takes a model that does not attribute credit, and adds credit onto it. We prove a lower bound for CCA retrofitting under a weak optimality requirement. Given black-box access to the starting model, retrofitting requires query complexity exponential in the length of the model's outputs.