Probing Cultural Signals in Large Language Models through Author Profiling
arXiv cs.CL / 3/18/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates whether large language models can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gender and ethnicity without task-specific fine-tuning.
- Across open-source models and over 10,000 lyrics, the study finds non-trivial profiling performance but reveals systematic cultural alignment, with most models defaulting toward North American ethnicity and some (e.g., DeepSeek-1.5B) aligning with Asian ethnicity.
- The authors introduce two fairness metrics, Modality Accuracy Divergence (MAD) and Recall Divergence (RD), to quantify disparities in model outputs and biases across models.
- They report model-specific bias differences, noting Ministral-8B as having the strongest ethnicity bias and Gemma-12B as the most balanced, and provide code on GitHub for replication.
Related Articles
I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).
Dev.to

Interesting loop
Reddit r/LocalLLaMA
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
A supervisor or "manager" Al agent is the wrong way to control Al
Reddit r/artificial
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
Reddit r/LocalLLaMA