Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
arXiv cs.AI / 4/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Hessian-Enhanced Token Attribution (HETA) to explain how input tokens contribute to outputs in decoder-only (autoregressive) LLMs, where prior methods often break down for causal generation dynamics.
- HETA combines a semantic transition vector, Hessian-based second-order sensitivity scores, and KL-divergence-based information loss when masking tokens to produce context-aware and causally faithful attributions.
- The framework is evaluated across multiple models and datasets, showing improved attribution faithfulness and better alignment with human annotations versus existing attribution approaches.
- The authors also contribute a curated benchmark dataset to systematically assess attribution quality specifically for generative settings.
Related Articles
"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"
Dev.to
"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris
Dev.to
"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from
Dev.to