Robust Explanations for User Trust in Enterprise NLP Systems
arXiv cs.CL / 4/15/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses how to evaluate whether token-level explanations for enterprise NLP remain robust and trustworthy when models are accessed only through black-box APIs, limiting typical representation-based explainer methods.
- It proposes a unified black-box robustness evaluation framework using leave-one-out occlusion and quantifies stability via a “top-token flip rate” under realistic perturbations (swap, deletion, shuffling, and back-translation) across multiple severities.
- Experiments on three benchmark datasets and six encoder/decoder models (BERT, RoBERTa, Qwen 7B/14B, Llama 8B/70B) over 64,800 cases show decoder LLMs have substantially more stable explanations than encoder baselines, with 73% lower flip rates on average.
- The study finds explanation stability increases with model scale (about a 44% gain from 7B to 70B) and links robustness to inference cost, producing a cost–robustness tradeoff curve to guide model/explanation selection for compliance-sensitive deployments.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business
The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to
5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning