Bias in Large Language Models: Origin, Evaluation, and Mitigation

arXiv cs.CL / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article is a comprehensive review that maps where bias in large language models (LLMs) comes from and how it appears across common NLP tasks.
It classifies LLM bias into intrinsic (originating from the model/training process) and extrinsic (coming from outside sources such as data or context), clarifying different bias types and their behaviors.
It evaluates existing bias-detection approaches, organizing them into data-level, model-level, and output-level methods to help researchers select appropriate evaluation tools.
It presents mitigation techniques grouped into pre-model, intra-model, and post-model interventions, noting both effectiveness and limitations of each category.
The review also discusses ethical and legal risks of biased LLMs, highlighting potential harms in high-stakes domains like healthcare and criminal justice.

Abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. We categorize biases as intrinsic and extrinsic, analyzing their manifestations in various NLP tasks. The review critically assesses a range of bias evaluation methods, including data-level, model-level, and output-level approaches, providing researchers with a robust toolkit for bias detection. We further explore mitigation strategies, categorizing them into pre-model, intra-model, and post-model techniques, highlighting their effectiveness and limitations. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice. By synthesizing current knowledge on bias in LLMs, this review contributes to the ongoing effort to develop fair and responsible AI systems. Our work serves as a comprehensive resource for researchers and practitioners working towards understanding, evaluating, and mitigating bias in LLMs, fostering the development of more equitable AI technologies.