ICE: Intervention-Consistent Explanation Evaluation with Statistical Grounding for LLMs
arXiv cs.CL / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- ICE introduces Intervention-Consistent Explanation (ICE), a framework that compares explanations against matched random baselines via randomized tests under multiple intervention operators, yielding win rates with confidence intervals.
- It evaluates 7 LLMs across 4 English tasks, 6 non-English languages, and 2 attribution methods, and finds faithfulness is operator-dependent with gaps up to 44 percentage points, with deletion inflating estimates on short text but reversing on long text.
- Randomized baselines reveal anti-faithfulness in about one-third of configurations, and faithfulness shows essentially no correlation with human plausibility.
- The study highlights dramatic model-language interactions not explained by tokenization, and the authors release the ICE framework and ICEBench benchmark.
Related Articles
The Honest Guide to AI Writing Tools in 2026 (What Actually Works)
Dev.to
Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization
Dev.to
The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google
Dev.to
How I built a 4-product AI income stack in 4 months (the honest version)
Dev.to
I stopped writing AI prompts from scratch. Here is the system I built instead.
Dev.to