Who Wrote the Book? Detecting and Attributing LLM Ghostwriters

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces GhostWriteBench, a new dataset for LLM authorship attribution using long-form (50K+ words) book-length texts generated by frontier models to evaluate generalization across multiple OOD dimensions.
It also proposes TRACE, an interpretable and lightweight fingerprinting approach that infers token-level transition patterns (such as word-rank changes) using a separate lightweight language model.
TRACE is designed to work with both open- and closed-source LLMs, addressing a key challenge in real-world attribution where model access may be restricted.
Experiments reported on GhostWriteBench indicate TRACE delivers state-of-the-art results, maintains robustness under OOD conditions, and performs effectively with limited training data.

Abstract

In this paper, we introduce GhostWriteBench, a dataset for LLM authorship attribution. It comprises long-form texts (50K+ words per book) generated by frontier LLMs, and is designed to test generalisation across multiple out-of-distribution (OOD) dimensions, including domain and unseen LLM author. We also propose TRACE -- a novel fingerprinting method that is interpretable and lightweight -- that works for both open- and closed-source models. TRACE creates the fingerprint by capturing token-level transition patterns (e.g., word rank) estimated by another lightweight language model. Experiments on GhostWriteBench demonstrate that TRACE achieves state-of-the-art performance, remains robust in OOD settings, and works well in limited training data scenarios.

Black Hat Asia

AI Business

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

Dev.to

BYOK is not just a pricing model: why it changes AI product trust

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

Who Wrote the Book? Detecting and Attributing LLM Ghostwriters

Key Points

Abstract

Related Articles

Black Hat Asia

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

BYOK is not just a pricing model: why it changes AI product trust

AI Citation Registries and Identity Persistence Across Records

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer