OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning
arXiv cs.CL / 4/29/2026
💬 OpinionModels & Research
Key Points
- The paper introduces OMHBench, a new benchmark (6,144 questions) built to test omni-modal multi-hop reasoning across text, vision, and speech with balanced, jointly grounded reasoning paths.
- It argues that existing MLLM evaluation frameworks are flawed because they allow modality shortcuts and biased reasoning trajectories.
- Evaluations of 13 state-of-the-art MLLMs show a substantial performance gap between proprietary and open-source models.
- The study finds proprietary models are still highly sensitive to how reasoning paths vary, leading to uneven grounding across modalities.
- Models perform worst when processing the speech modality, highlighting the need for balanced omni-modal, multi-hop evaluation rather than text/vision-only testing.
Related Articles
LLMs will be a commodity
Reddit r/artificial

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu

AI Voice Agents in Production: What Actually Works in 2026
Dev.to

How we built a browser-based AI Pathology platform
Dev.to