Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection

arXiv cs.CL / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The paper argues that training-free AI text detection based on model log-probabilities has a performance ceiling because RLHF-trained models converge toward human-like probability distributions.
It proposes a new detection signal using character distribution signatures, supported by a theory that AI models follow global character patterns while humans show domain-specialized distributions, creating a “Wall of Separation.”
To evaluate this systematically, the authors introduce the MDTA benchmark with 642,274 prompt-aligned samples spanning 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, expanding on HC3 with modern responses and stronger augmentation.
They introduce the Letter Distribution Score (LD-Score), which shows low correlation (r = 0.08–0.13) with perplexity-based methods, and report that combining LD-Score with existing detectors via a non-linear classifier improves AUROC and F1, especially in specialized domains.
The MDTA dataset is released publicly on Hugging Face for further research and benchmarking.

Abstract

Training-free AI text detection methods primarily rely on model log-probabilities, achieving strong performance through approaches like Binoculars and DNA-DetectLLM. However, these methods face a fundamental ceiling as models are optimized through RLHF to produce human-like probability distributions. We introduce an alternative detection signal based on character distribution signatures. We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a "Wall of Separation" where human-AI divergence significantly exceeds AI-AI divergence. To enable systematic evaluation, we construct the Models-Domains-Temperatures-Adversarials (MDTA) benchmark comprising 642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, substantially expanding the HC3 dataset with modern model responses, temperature variation, and adversarial augmentation. We introduce the Letter Distribution Score (LD-Score), demonstrating low correlation (r = 0.08-0.13) with perplexity methods. When integrated with DNA-DetectLLM, Binoculars and FastDetectGPT via a non-linear classifier, LD-Score yields consistent improvements in AUROC and F1, with particularly pronounced gains in specialized domains where vocabulary constraints amplify the detection signal. The MDTA dataset can be accessed at: https://huggingface.co/datasets/nsp909/MDTA.