When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making

arXiv cs.CL / 3/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

We introduce ICE-Guard, a framework that applies intervention consistency testing to detect three types of spurious feature reliance in LLMs across 3,000 vignettes in 10 high-stakes domains, evaluating 11 LLMs from 8 families.
The study identifies three bias types—demographic (name/race swaps), authority (credential/prestige swaps), and framing (positive/negative restatements)—and finds authority bias (mean 5.8%) and framing bias (5.0%) substantially exceed demographic bias (2.2%).
Bias concentration varies by domain, with finance showing 22.6% authority bias and criminal justice showing only 2.8%.
A structured decomposition approach, where the LLM extracts features and a deterministic rubric makes the final decision, reduces flip rates by up to 100% (median 49% across 9 models).
The ICE-guided detect-diagnose-mitigate-verify loop achieves about 78% bias reduction via iterative prompt patching, and validation against COMPAS recidivism data suggests the benchmark provides a conservative estimate of real-world bias; code and data are publicly available.

Abstract

Large language models (LLMs) are increasingly used for high-stakes decisions, yet their susceptibility to spurious features remains poorly characterized. We introduce ICE-Guard, a framework applying intervention consistency testing to detect three types of spurious feature reliance: demographic (name/race swaps), authority (credential/prestige swaps), and framing (positive/negative restatements). Across 3,000 vignettes spanning 10 high-stakes domains, we evaluate 11 LLMs from 8 families and find that (1) authority bias (mean 5.8%) and framing bias (5.0%) substantially exceed demographic bias (2.2%), challenging the field's narrow focus on demographics; (2) bias concentrates in specific domains -- finance shows 22.6% authority bias while criminal justice shows only 2.8%; (3) structured decomposition, where the LLM extracts features and a deterministic rubric decides, reduces flip rates by up to 100% (median 49% across 9 models). We demonstrate an ICE-guided detect-diagnose-mitigate-verify loop achieving cumulative 78% bias reduction via iterative prompt patching. Validation against real COMPAS recidivism data shows COMPAS-derived flip rates exceed pooled synthetic rates, suggesting our benchmark provides a conservative estimate of real-world bias. Code and data are publicly available.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/20DailyView insight →

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer

The Batch

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".

Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development

Dev.to

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

Dev.to

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)

Dev.to

When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making

Key Points

Abstract

💡 Insights using this article

Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".

Lessons from Academic Plagiarism Tools for SaaS Product Development

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

💡 Insights using this article

Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".

Lessons from Academic Plagiarism Tools for SaaS Product Development

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems