ProText: A benchmark dataset for measuring (mis)gendering in long-form texts

arXiv cs.CL / 3/31/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ProText is a new benchmark dataset designed to measure gendering and misgendering in long-form English texts across stylistically diverse inputs.
The dataset covers three key dimensions—theme nouns (e.g., names/occupations/titles/kinship terms), theme category (stereotypically male/female vs gender-neutral/non-gendered), and pronoun category (masculine/feminine/gender-neutral/none).
It is specifically intended to evaluate how state-of-the-art LLMs may introduce or amplify (mis)gendering during text transformations such as summarization and rewrites, going beyond pronoun-resolution benchmarks and the gender binary.
A mini validation study suggests systematic gender bias, especially when the input lacks explicit gender cues, with models tending to default to heteronormative assumptions.
ProText aims to enable more nuanced analysis of bias, stereotyping, and misgendering effects from relatively small prompt/model sets.

Abstract

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the gender binary. We validated ProText through a mini case study, showing that even with just two prompts and two models, we can draw nuanced insights regarding gender bias, stereotyping, misgendering, and gendering. We reveal systematic gender bias, particularly when inputs contain no explicit gender cues or when models default to heteronormative assumptions.