Extending Minimal Pairs with Ordinal Surprisal Curves and Entropy Across Applied Domains

arXiv cs.CL / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper extends the minimal pairs evaluation from binary grammaticality judgments to ordinal-scale classification using information-theoretic surprisal and entropy to capture both the model's preferred response and its uncertainty.
It computes negative log probabilities (surprisal) at each position on rating scales (e.g., 1-5 or 1-9) rather than requiring text generation.
The framework is demonstrated across four domains—social-ecological-technological systems classification, causal statement identification, figurative language detection, and deductive qualitative coding—showing interpretable signals.
Surprisal curves display minima near expected scale positions and higher entropy for genuinely ambiguous items, offering a nuanced view of model knowledge beyond generation-based evaluations.

Abstract

The minimal pairs paradigm of comparing model probabilities for contrasting completions has proven useful for evaluating linguistic knowledge in language models, yet its application has largely been confined to binary grammaticality judgments over syntactic phenomena. Additionally, standard prompting-based evaluation requires expensive text generation, may elicit post-hoc rationalizations rather than model judgments, and discards information about model uncertainty. We address both limitations by extending surprisal-based evaluation from binary grammaticality contrasts to ordinal-scaled classification and scoring tasks across multiple domains. Rather than asking models to generate answers, we measure the information-theoretic "surprise" (negative log probability) they assign to each position on rating scales (e.g., 1-5 or 1-9), yielding full surprisal curves that reveal both the model's preferred response and its uncertainty via entropy. We explore this framework across four domains: social-ecological-technological systems classification, causal statement identification (binary and scaled), figurative language detection, and deductive qualitative coding. Across these domains, surprisal curves produce interpretable classification signals with clear minima near expected ordinal scale positions, and entropy over the completion tended to distinguish genuinely ambiguous items from easier items.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

Extending Minimal Pairs with Ordinal Surprisal Curves and Entropy Across Applied Domains

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer