OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model
arXiv cs.CV / 4/23/2026
📰 NewsModels & Research
Key Points
- The paper introduces OMIBench, a new benchmark for evaluating Olympiad-level multi-image reasoning in large vision-language models (LVLMs).
- It addresses a limitation of prior benchmarks by requiring evidence to be distributed across multiple images rather than focusing mainly on single-image analysis.
- OMIBench includes problems spanning biology, chemistry, mathematics, and physics Olympiads, with manually annotated rationales and protocols for both exact and semantic answer matching.
- Experiments show substantial performance gaps among existing models, with even the strongest LVLMs (e.g., Gemini-3-Pro) reaching only around 50% accuracy.
- The authors propose OMIBench as a targeted resource for studying and improving multi-image reasoning capabilities in LVLMs.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

GPT Image 2 vs DALL-E 3: What Actually Changed in OpenAI's New Image Model
Dev.to

AI Tutor for Science Students — Physics Chemistry Biology Solved by AI
Dev.to