AI Navigate

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

arXiv cs.AI / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study investigates whether large language models exhibit human-like cognitive biases that could affect judicial sentencing decisions, focusing on virtuous victim effects and prestige-based halo effects.
  • It uses vignettes modified to avoid training-data recall and evaluates five representative LLMs across multiple trials to isolate each manipulation.
  • The findings show a larger virtuous victim effect, no statistically significant penalty for adjacent-consent, and a halo effect that is slightly reduced compared to humans, with credential-based prestige showing the largest reduction.
  • Despite cross-model variation, the research suggests only modest improvements over human benchmarks and notes current judicial usage is restricted, underscoring the need for caution and bias mitigation.

Abstract

We investigate whether large language models (LLMs) display human-like cognitive biases, focusing on potential implications for assistance in judicial sentencing, a decision-making system where fairness is paramount. Two of the most relevant biases were chosen: the virtuous victim effect (VVE), with emphasis given to its reduction when adjacent consent is present, and prestige-based halo effects (occupation, company, and credentials). Using vignettes that were altered from prior literature to avoid LLMs recalling from their training data, we isolate each manipulation by holding all other details consistent, then measuring the percentage difference in outcomes. Five models were evaluated as representative LLMs in independent multi-run trials per condition (ChatGPT 5 Instant, ChatGPT 5 Thinking, DeepSeek V3.1, Claude Sonnet 4, Gemini 2.5 Flash). Our research discovers that there is larger VVE, there is no statistically significant penalty for adjacent-consent, and the halo effect is slightly reduced when compared to humans, with an exception for credential based prestige, which had a large reduction. Despite the variation across different models and outputs restricting current judicial usage, there were modest improvements compared to human benchmarks.