IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language
arXiv cs.CL / 4/21/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper finds that existing AI-based content moderation often cannot reliably distinguish reclaimed slur usage from hateful usage, leading to the suppression of marginalized communities' voices.
- Using quantitative/qualitative analyses, the researchers build and analyze an annotated corpus of reclaimed slur usage across LGBTQIA+, Black, and women communities (e.g., f-word, n-word, b-word).
- Annotation shows low inter-annotator agreement even among in-group annotators, suggesting that how reclaimed slurs are interpreted is highly subjective and depends on nuanced context.
- The study reports poor alignment between human judgments and automated hate-speech assessments from Perspective API, with annotator decisions more associated with whether the slur is derogatory and whether it targets the self.
- Semi-structured interviews indicate that differences in lived experience and personal history drive variation in interpretations, underscoring the limits of current automated moderation approaches.
Related Articles

A practical guide to getting comfortable with AI coding tools
Dev.to

Competitive Map: 10 AI Agent Platforms vs AgentHansa
Dev.to

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to