Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection

arXiv cs.CL / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the growing difficulty of detecting AI-generated text because generator-specific artifacts become unreliable as new LLMs emerge.
  • It proposes a disentangled detection framework that separates generator-aware artifacts from AI-detection semantics using a compact latent representation, perturbation-based regularization, and a discriminative adaptation stage.
  • Experiments on the MAGE benchmark (20 LLMs across 7 categories) show consistent gains over state-of-the-art approaches, including up to 24.2% accuracy and 26.2% F1 improvements.
  • The method exhibits scalability in open-set settings, with continued performance improvements as the diversity of training generators increases.
  • The authors will release the source code publicly to support replication and further research.

Abstract

As large language models (LLMs) generate text that increasingly resembles human writing, the subtle cues that distinguish AI-generated content from human-written content become increasingly challenging to capture. Reliance on generator-specific artifacts is inherently unstable, since new models emerge rapidly and reduce the robustness of such shortcuts. This generalizes unseen generators as a central and challenging problem for AI-text detection. To tackle this challenge, we propose a progressively structured framework that disentangles AI-detection semantics from generator-aware artifacts. This is achieved through a compact latent encoding that encourages semantic minimality, followed by perturbation-based regularization to reduce residual entanglement, and finally a discriminative adaptation stage that aligns representations with task objectives. Experiments on MAGE benchmark, covering 20 representative LLMs across 7 categories, demonstrate consistent improvements over state-of-the-art methods, achieving up to 24.2% accuracy gain and 26.2% F1 improvement. Notably, performance continues to improve as the diversity of training generators increases, confirming strong scalability and generalization in open-set scenarios. Our source code will be publicly available at https://github.com/PuXiao06/DRGD.