Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- DORI is a cognitively grounded benchmark that makes object orientation the primary target and decomposes it into four dimensions evaluated at coarse and granular levels.
- It uses 13,652 images from 14 sources to create 33,656 multiple-choice questions across 67 object categories, with bounding-box isolation, standardized spatial reference frames, and structured prompts to isolate orientation.
- Evaluating 24 state-of-the-art vision-language models reveals that models strong on general spatial tasks perform near-random on orientation reasoning, with the best achieving 54.2% coarse and 45.0% granular judgments.
- The results indicate orientation understanding remains an unsolved challenge for multimodal systems and have implications for robotic manipulation, 3D scene reconstruction, and human-AI interaction.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails
Dev.to
Complete Guide: How To Make Money With Ai
Dev.to
I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+
Dev.to
The Demethylation
Dev.to