Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The arXiv paper studies whether large language models replicate gender-related biases in hiring-style evaluations, focusing on differences in recommendations and perceived qualifications.
  • It reports that, for the same résumé input, an LLM is more likely to recommend hiring a female candidate and rate them as more qualified.
  • Despite higher hire likelihood and qualification judgments for female candidates, the model is still found to recommend lower pay for women compared with men.
  • The research examines prompt engineering as a potential bias-mitigation approach, evaluating whether prompting can reduce or alter biased outputs in hiring scenarios.

Abstract

The growing prominence of large language models (LLMs) in daily life has heightened concerns that LLMs exhibit many of the same gender-related biases as their creators. In the context of hiring decisions, we quantify the degree to which LLMs perpetuate societal biases and investigate prompt engineering as a bias mitigation technique. Our findings suggest that for a given resum\'e, an LLM is more likely to hire a female candidate and perceive them as more qualified, but still recommends lower pay relative to male candidates.