Exons-Detect: Identifying and Amplifying Exonic Tokens via Hidden-State Discrepancy for Robust AI-Generated Text Detection

arXiv cs.CL / 3/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Exons-Detect は、AI生成文の検出を「エキソン(exonic)に相当する情報トークンを見つけて増幅する」という発想で行う、学習不要(training-free)の検出手法として提案された。
  • 具体的には dual-model 設定で hidden-state の不一致(hidden-state discrepancy)を測り、重要度に基づいてトークンを再重み付けし、その結果を解釈可能なスコア(translation score)へ集約する。
  • 従来の学習不要手法が“全トークンが一様に効く”前提に弱いのに対し、局所的な変更や短文でも頑健に動作することを目指している。
  • 実験では、DetectRL において最強の既存ベースライン比で平均 AUROC が相対 2.2% 改善し、さらに敵対的攻撃や入力長の変化への耐性も示した。

Abstract

The rapid advancement of large language models has increasingly blurred the boundary between human-written and AI-generated text, raising societal risks such as misinformation dissemination, authorship ambiguity, and threats to intellectual property rights. These concerns highlight the urgent need for effective and reliable detection methods. While existing training-free approaches often achieve strong performance by aggregating token-level signals into a global score, they typically assume uniform token contributions, making them less robust under short sequences or localized token modifications. To address these limitations, we propose Exons-Detect, a training-free method for AI-generated text detection based on an exon-aware token reweighting perspective. Exons-Detect identifies and amplifies informative exonic tokens by measuring hidden-state discrepancy under a dual-model setting, and computes an interpretable translation score from the resulting importance-weighted token sequence. Empirical evaluations demonstrate that Exons-Detect achieves state-of-the-art detection performance and exhibits strong robustness to adversarial attacks and varying input lengths. In particular, it attains a 2.2\% relative improvement in average AUROC over the strongest prior baseline on DetectRL.