Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection
arXiv cs.CL / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper argues that training-free AI text detection based on model log-probabilities has a performance ceiling because RLHF-trained models converge toward human-like probability distributions.
- It proposes a new detection signal using character distribution signatures, supported by a theory that AI models follow global character patterns while humans show domain-specialized distributions, creating a “Wall of Separation.”
- To evaluate this systematically, the authors introduce the MDTA benchmark with 642,274 prompt-aligned samples spanning 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, expanding on HC3 with modern responses and stronger augmentation.
- They introduce the Letter Distribution Score (LD-Score), which shows low correlation (r = 0.08–0.13) with perplexity-based methods, and report that combining LD-Score with existing detectors via a non-linear classifier improves AUROC and F1, especially in specialized domains.
- The MDTA dataset is released publicly on Hugging Face for further research and benchmarking.
Related Articles

Black Hat USA
AI Business

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to