Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI
arXiv cs.CL / 4/24/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a key limitation in authorship attribution and AI-generated text detection: models often entangle writing style with content, causing poor cross-domain generalization.
- It proposes EAVAE (Explainable Authorship Variational Autoencoder), which disentangles style and content using a separation-by-design architecture with dedicated encoders for each.
- EAVAE pretrains style encoders via supervised contrastive learning on diverse author data, then fine-tunes using a variational autoencoder setup to learn disentangled latent representations.
- A novel discriminator both classifies whether style/content representations come from the same or different sources and produces a natural-language explanation, aiming to reduce confounds and improve interpretability.
- Experimental results show state-of-the-art authorship attribution on Amazon Reviews, PAN21, and HRS, and strong few-shot performance for AI-generated text detection on the M4 dataset, with code/data released online.


