Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation

arXiv cs.CL / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

A large-scale evaluation of 19 LLMs across two experiments with more than 4 million annotation judgments shows that automated text annotation systematically reflects racial stereotypes, even when annotating neutral text.
In the names-based experiment spanning 39 tasks, texts with Black-associated names were rated as more aggressive and more gossipy by the majority of models, while Asian names were perceived as more intelligent but less confident and less sociable.
In the dialect experiment, the same sentence written in African American Vernacular English was judged significantly less professional, less indicative of an educated speaker, more toxic, and more angry by nearly all models.
The findings imply that using LLMs as automated annotators can embed socially patterned biases into datasets and measurements underpinning research, governance, and decision-making, with fine-tuning sometimes overcorrecting in hireability for minority-named applicants.

Abstract

Large language models (LLMs) are increasingly used for automated text annotation in tasks ranging from academic research to content moderation and hiring. Across 19 LLMs and two experiments totaling more than 4 million annotation judgments, we show that subtle identity cues embedded in text systematically bias annotation outcomes in ways that mirror racial stereotypes. In a names-based experiment spanning 39 annotation tasks, texts containing names associated with Black individuals are rated as more aggressive by 18 of 19 models and more gossipy by 18 of 19. Asian names produce a bamboo-ceiling profile: 17 of 19 models rate individuals as more intelligent, while 18 of 19 rate them as less confident and less sociable. Arab names elicit cognitive elevation alongside interpersonal devaluation, and all four minority groups are consistently rated as less self-disciplined. In a matched dialect experiment, the same sentence is judged significantly less professional (all 19 models, mean gap

-0.774

), less indicative of an educated speaker (

-0.688

), more toxic (18/19), and more angry (19/19) when written in African American Vernacular English rather than Standard American English. A notable exception occurs for name-based hireability, where fine-tuning appears to overcorrect, systematically favoring minority-named applicants. These findings suggest that using LLMs as automated annotators can embed socially patterned biases directly into the datasets and measurements that increasingly underpin research, governance, and decision-making.

The programming passion is melting

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

Reddit r/LocalLLaMA

Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation

Key Points

Abstract

Related Articles

The programming passion is melting

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer