Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks
arXiv cs.CL / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper demonstrates that LLM-based graders show implicit bias based on writing style in Essay/Writing tasks, even when instructed to focus on content correctness.
- The study used 180 student responses across Mathematics, Programming, and Essay tasks and compared two open-source models, LLaMA 3.3 70B and Qwen 2.5 72B.
- Results show statistically significant bias in Essay/Writing tasks (p < 0.05) with effect sizes ranging from medium to very large, including penalties for informal language and non-native phrasing on a 10-point scale.
- By contrast, Mathematics and Programming tasks showed minimal bias, highlighting that grading fairness is task-dependent; the authors call for bias auditing protocols before adopting LLM graders.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).
Dev.to

Interesting loop
Reddit r/LocalLLaMA
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
A supervisor or "manager" Al agent is the wrong way to control Al
Reddit r/artificial
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
Reddit r/LocalLLaMA