Interpretable Predictability-Based AI Text Detection: A Replication Study
arXiv cs.CL / 3/17/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper is a replication and extension of the AuTexTification 2023 system for authorship attribution of machine-generated texts, noting that exact replication was hindered by data splits, model availability, and implementation details.
- It adds 26 document-level stylometric features and experiments with newer multilingual language models, applying SHAP to understand feature influence on decisions.
- The study replaces GPT-2 with newer generative models such as Qwen and mGPT for probabilistic features, and uses mDeBERTa-v3-base for contextual representations across English and Spanish.
- The multilingual configuration achieves results comparable to or better than language-specific models, and this holds for Subtask 1 and Subtask 2.
- The authors stress that clear documentation is crucial for reliable replication and fair comparison of systems.




