Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models

arXiv cs.AI / 4/25/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Text-to-image models can reproduce demographic and professional stereotypes, such as producing lighter skin tones for roles like “doctor” or “CEO” and more diverse (often darker) depictions for lower-status roles like “janitor.”
Existing bias-mitigation approaches often require retraining or curated datasets, limiting accessibility for most users.
The paper proposes a lightweight, inference-time prompting framework that intervenes at the prompt level without changing the underlying generative model.
Rather than enforcing a single notion of “fairness,” the method lets users choose among multiple fairness specifications, including uniform targets or more complex LLM-based definitions with cited sources and confidence estimates.
Experiments with 36 prompts across 30 occupations and 6 other contexts show skin-tone distributions shift toward the declared target and deviate less when fairness targets are specified directly in skin-tone space.

Abstract

Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such as 'doctor' or 'CEO' frequently yield lighter-skinned outputs, while lower-status roles like 'janitor' show more diversity, reinforcing stereotypes. Existing mitigation methods typically require retraining or curated datasets, making them inaccessible to most users. We propose a lightweight, inference-time framework that mitigates representational bias through prompt-level intervention without modifying the underlying model. Instead of assuming a single definition of fairness, our approach allows users to select among multiple fairness specifications-ranging from simple choices such as a uniform distribution to more complex definitions informed by a large language model(LLM) that cites sources and provides confidence estimates. These distributions guide the construction of demographic specific prompt variants in the corresponding proportions, and we evaluate alignment by auditing adherence to the declared target and measuring the resulting skin tone distribution rather than assuming uniformity as 'fairness'. Across 36 prompts spanning 30 occupations and 6 non-occupational contexts, our method shifts observed skin-tone outcomes in directions consistent with the declared target, and reduces deviation from targets when the target is defined directly in skin-tone space(fallback). This work demonstrates how fairness interventions can be made transparent, controllable, and usable at inference time, directly empowering users of generative AI.