When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
arXiv cs.CL / 4/9/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces the Graded Color Attribution (GCA) dataset as a controlled benchmark for studying when vision-language models (VLMs) will behave unexpectedly and whether they follow their own stated introspective rules.
- In GCA, both humans and VLMs learn a pixel-level threshold rule for when an object should be labeled with a given color based on minimum color coverage under multiple recoloring conditions.
- Results show humans remain largely faithful to their stated rules, and apparent human “violations” are attributed to overestimation of color coverage rather than rule-breaking.
- In contrast, VLMs systematically contradict their own introspective rules, even when they are strong estimators of color coverage, with GPT-5-mini violating stated rules in nearly 60% of cases under strong color priors.
- The findings indicate that world-knowledge priors reduce introspection faithfulness for models in patterns unlike human cognition, suggesting VLM self-knowledge is miscalibrated and raising concerns for trustworthy, high-stakes deployment.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to