Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces “Cards Against LLMs,” a benchmark that evaluates humor alignment by having five frontier language models play Cards Against Humanity-style rounds against human preferences.
In nearly 9,900 rounds, the models choose the “funniest” option from 10 candidates and all outperform a random baseline, but their alignment with human judgments is only modest.
A key finding is that model-to-model agreement is much higher than model-to-human agreement, suggesting that what looks like shared taste may not match human preference well.
The study argues that systematic position bias and content-based preferences can partially explain the misalignment, raising questions about whether humor judgments reflect genuine preference or artifacts of inference/alignment.

Abstract

Humor is one of the most culturally embedded and socially significant dimensions of human communication, yet it remains largely unexplored as a dimension of Large Language Model (LLM) alignment. In this study, five frontier language models play the same Cards Against Humanity games (CAH) as human players. The models select the funniest response from a slate of ten candidate cards across 9,894 rounds. While all models exceed the random baseline, alignment with human preference remains modest. More striking is that models agree with each other substantially more often than they agree with humans. We show that this preference is partly explained by systematic position biases and content preferences, raising the question whether LLM humor judgment reflects genuine preference or structural artifacts of inference and alignment.