Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs Supplementary
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- DORI is a cognitively grounded benchmark that makes object orientation the primary target and decomposes it into four dimensions evaluated at coarse and granular levels.
- It uses 13,652 images from 14 sources to create 33,656 multiple-choice questions across 67 object categories, with bounding-box isolation, standardized spatial reference frames, and structured prompts to isolate orientation.
- Evaluating 24 state-of-the-art vision-language models reveals that models strong on general spatial tasks perform near-random on orientation reasoning, with the best achieving 54.2% coarse and 45.0% granular judgments.
- The results indicate orientation understanding remains an unsolved challenge for multimodal systems and have implications for robotic manipulation, 3D scene reconstruction, and human-AI interaction.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA