How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
arXiv cs.CL / 4/20/2026
📰 NewsModels & Research
Key Points
- The paper investigates whether LLM performance as a “pragmatic judge” (listener) aligns with its performance as a “pragmatic speaker” for pragmatic competence tasks.
- By directly comparing LLMs in both roles, the study evaluates multiple open-weight and proprietary models across three pragmatic settings.
- Results show a consistent asymmetry: many LLMs are substantially better at judging the appropriateness of outputs than at generating pragmatically appropriate language.
- The findings indicate that pragmatic evaluation and pragmatic generation are only weakly aligned in current LLMs, suggesting a need for more integrated evaluation approaches.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to

Space now with memory
Dev.to