Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

arXiv cs.AI / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper shows that when using LLM-as-a-Judge for automated trust/evaluation, disclosed source labels can bias outcomes even when the underlying content is identical.
Using a counterfactual setup, both human participants and LLM judges rated higher trust for content labeled “human-authored” versus the same content labeled “AI-generated.”
Eye-tracking evidence indicates humans use the source label as a heuristic shortcut, and the study finds the LLMs similarly over-focus attention on the label region compared with the content region.
The label-driven effects differ by condition: label dominance is stronger under “Human” labels, and the LLM’s measured decision uncertainty is higher under “AI” labels.
The authors raise validity concerns for label-sensitive LLM-as-a-Judge evaluation and suggest debiased evaluation/mitigation strategies, including caution that alignment with human preferences may transfer human heuristic reliance to models.

Abstract

Large language models (LLMs) are increasingly used as automated evaluators (LLM-as-a-Judge). This work challenges its reliability by showing that trust judgments by LLMs are biased by disclosed source labels. Using a counterfactual design, we find that both humans and LLM judges assign higher trust to information labeled as human-authored than to the same content labeled as AI-generated. Eye-tracking data reveal that humans rely heavily on source labels as heuristic cues for judgments. We analyze LLM internal states during judgment. Across label conditions, models allocate denser attention to the label region than the content region, and this label dominance is stronger under Human labels than AI labels, consistent with the human gaze patterns. Besides, decision uncertainty measured by logits is higher under AI labels than Human labels. These results indicate that the source label is a salient heuristic cue for both humans and LLMs. It raises validity concerns for label-sensitive LLM-as-a-Judge evaluation, and we cautiously raise that aligning models with human preferences may propagate human heuristic reliance into models, motivating debiased evaluation and alignment.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/8DailyView insight →

The enforcement gap: why finding issues was never the problem

Dev.to

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises

Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently

Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Can’t Treat Them the Same

Dev.to

THE ATLAS SESSIONS

Dev.to

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Key Points

Abstract

💡 Insights using this article

Related Articles

The enforcement gap: why finding issues was never the problem

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently

Agentic AI vs Traditional Automation: Why Modern Enterprises Can’t Treat Them the Same

THE ATLAS SESSIONS

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer