Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
arXiv cs.CL / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies counterfactual unfairness in LLMs using humor as a lens on social assumptions learned from training data.
- It proposes a framework covering three tasks—humor generation refusal, speaker intention inference, and relational/societal impact prediction—across both identity-agnostic and identity-disparaging humor.
- The researchers introduce interpretable bias metrics to quantify asymmetric behavior when the identity of the speaker or addressee is swapped.
- Experiments on state-of-the-art LLMs show consistent disparities, including higher refusal rates (up to 67.5%), higher maliciousness judgments (64.7%), and increased social-harm ratings (up to 1.5 points on a 5-point scale) for jokes attributed to privileged speakers.
- The findings suggest that generative models may simultaneously exhibit sensitivity and stereotyping, making fairness and cultural alignment more difficult to achieve.
Related Articles
The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence
Dev.to
Context Engineering for Developers: A Practical Guide (2026)
Dev.to
GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to
I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA