Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

arXiv cs.AI / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tests whether large language models capture human social meaning both qualitatively and quantitatively, using new calibration-focused metrics (ESR and CDS) to separate structural fidelity from magnitude calibration.
Across a case study on numerical (im)precision, frontier LLMs reproduce the qualitative structure of human social inferences but vary widely in how strongly they calibrate the magnitude of those inferences.
Prompting grounded in pragmatic theory—specifically encouraging reasoning about the speaker’s knowledge state and communicative motives—reduces magnitude deviation more reliably than prompting that focuses on alternative-awareness.
Combining both pragmatic components improves multiple calibration-sensitive metrics across all evaluated models, though fine-grained magnitude calibration remains only partially resolved.
Overall, the results suggest LLMs model the inferential structure of pragmatic/social reasoning but still distort inferential strength, and pragmatic-theory prompting helps in a limited, incomplete way.

Abstract

Large language models (LLMs) increasingly exhibit human-like patterns of pragmatic and social reasoning. This paper addresses two related questions: do LLMs approximate human social meaning not only qualitatively but also quantitatively, and can prompting strategies informed by pragmatic theory improve this approximation? To address the first, we introduce two calibration-focused metrics distinguishing structural fidelity from magnitude calibration: the Effect Size Ratio (ESR) and the Calibration Deviation Score (CDS). To address the second, we derive prompting conditions from two pragmatic assumptions: that social meaning arises from reasoning over linguistic alternatives, and that listeners infer speaker knowledge states and communicative motives. Applied to a case study on numerical (im)precision across three frontier LLMs, we find that all models reliably reproduce the qualitative structure of human social inferences but differ substantially in magnitude calibration. Prompting models to reason about speaker knowledge and motives most consistently reduces magnitude deviation, while prompting for alternative-awareness tends to amplify exaggeration. Combining both components is the only intervention that improves all calibration-sensitive metrics across all models, though fine-grained magnitude calibration remains only partially resolved. LLMs thus capture inferential structure while variably distorting inferential strength, and pragmatic theory provides a useful but incomplete handle for improving that approximation.

How Bash Command Safety Analysis Works in AI Systems

Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

Dev.to

The Future of Artificial Intelligence in Everyday Life

Dev.to

Teaching Your AI to Read: Automating Document Triage for Investigators

Dev.to

Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

Key Points

Abstract

Related Articles

How Bash Command Safety Analysis Works in AI Systems

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

The Future of Artificial Intelligence in Everyday Life

Teaching Your AI to Read: Automating Document Triage for Investigators

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer