Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
arXiv cs.CL / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper argues that high benchmark accuracy in AI-generated text detectors may not reflect true machine-authorship identification in real-world conditions.
- It introduces an interpretable detection framework combining linguistic feature engineering, machine learning, and explainable AI, achieving leaderboard-competitive results (F1=0.9734) on PAN CLEF 2025 and COLING 2025.
- Cross-domain and cross-generator tests show significant generalization failures under distribution shift, with detector performance dropping when moving beyond the training domain.
- SHAP-based explanations indicate that the most influential features vary substantially across datasets, suggesting reliance on dataset-specific artifacts rather than stable signals of machine authorship.
- The authors release an open-source Python package that outputs predictions along with instance-level explanations to support replication and more robust detector development.




