Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

arXiv cs.CL / 4/23/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper argues that large language models often show “epistemic-rhetorical miscalibration,” where the rhetorical intensity is not proportionate to the underlying epistemic grounding.
It introduces a triadic epistemic-rhetorical marker (ERM) taxonomy quantified with three composite metrics: form-meaning divergence (FMD), genuine-to-performed epistemic ratio (GPR), and rhetorical device distribution entropy (RDDE).
Using 225 argumentative texts (~0.6M tokens) across expert human, non-expert human, and LLM-generated corpora, the authors find a consistent, model-agnostic LLM “epistemic signature.”
The results show that LLM outputs tend to overuse certain discourse patterns (e.g., tricolons and performed hesitancy markers) and have significantly higher FMD and more uniform rhetorical-device distribution than both human groups.
Because the annotation pipeline is fully automatable, the framework is positioned as a lightweight screening tool for miscalibration in AI-generated content and as features for LLM-generated text detection systems.

Abstract

Large language models (LLMs) exhibit systematic miscalibration with rhetorical intensity not proportionate to epistemic grounding. This study tests this hypothesis and proposes a framework for quantifying this decoupling by designing a triadic epistemic-rhetorical marker (ERM) taxonomy. The taxonomy is operationalized through composite metrics of form-meaning divergence (FMD), genuine-to-performed epistemic ratio (GPR), and rhetorical device distribution entropy (RDDE). Applied to 225 argumentative texts spanning approximately 0.6 Million tokens across human expert, human non-expert, and LLM-generated sub-corpora, the framework identifies a consistent, model-agnostic LLM epistemic signature. LLM-generated texts produce tricolon at nearly twice the expert rate (

\Delta = 0.95

), while human authors produce erotema at more than twice the LLM rate. Performed hesitancy markers appear at twice the human density in LLM output. FMD is significantly elevated in LLM texts relative to both human groups (

p < 0.001, \Delta = 0.68

), and rhetorical devices are distributed significantly more uniformly across LLM documents. The findings are consistent with theoretical intuitions derived from Gricean pragmatics, Relevance Theory, and Brandomian inferentialism. The annotation pipeline is fully automatable, making it deployable as a lightweight screening tool for epistemic miscalibration in AI-generated content and as a theoretically motivated feature set for LLM-generated text detection pipelines.