Grammar as a Behavioral Biometric: Using Cognitively Motivated Grammar Models for Authorship Verification

arXiv cs.CL / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses Authorship Verification (AV) by proposing a simpler, more scientifically grounded approach that models an author’s grammar using Cognitive Linguistics principles.
It introduces a metric, λG, defined as the likelihood ratio of a document under the candidate author’s grammar versus a reference population’s grammar.
Across twelve datasets, λG outperforms seven baseline methods and also achieves strong results against several neural network-based AV approaches.
The method is robust to small changes in the reference population composition and offers interpretable visualizations that improve explainability compared with many existing AV techniques.
The authors attribute λG’s effectiveness to the alignment between Cognitive Linguistics expectations and the view of writing style/grammar as a behavioral biometric.

Abstract

Authorship Verification (AV) is a key area of research in digital text forensics, which addresses the fundamental question of whether two texts were written by the same person. Numerous computational approaches have been proposed over the last two decades in an attempt to address this challenge. However, existing AV methods often suffer from high complexity, low explainability and especially from a lack of clear scientific justification. We propose a simpler method based on modeling the grammar of an author following Cognitive Linguistics principles. These models are used to calculate

\lambda_G

(LambdaG): the ratio of the likelihoods of a document given the candidate's grammar versus given a reference population's grammar. Our empirical evaluation, conducted on twelve datasets and compared against seven baseline methods, demonstrates that LambdaG achieves superior performance, including against several neural network-based AV methods. LambdaG is also robust to small variations in the composition of the reference population and provides interpretable visualizations, enhancing its explainability. We argue that its effectiveness is due to the method's compatibility with Cognitive Linguistics theories predicting that a person's grammar is a behavioral biometric.