OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

arXiv cs.CL / 4/29/2026

💬 OpinionModels & Research

共有:

Key Points

The paper introduces OMHBench, a new benchmark (6,144 questions) built to test omni-modal multi-hop reasoning across text, vision, and speech with balanced, jointly grounded reasoning paths.
It argues that existing MLLM evaluation frameworks are flawed because they allow modality shortcuts and biased reasoning trajectories.
Evaluations of 13 state-of-the-art MLLMs show a substantial performance gap between proprietary and open-source models.
The study finds proprietary models are still highly sensitive to how reasoning paths vary, leading to uneven grounding across modalities.
Models perform worst when processing the speech modality, highlighting the need for balanced omni-modal, multi-hop evaluation rather than text/vision-only testing.

Abstract

Multimodal Large Language Models (MLLMs) have increasingly supported omni-modal processing across text, vision, and speech. However, existing evaluation frameworks for such models suffer from critical limitations, including modality shortcuts and biased reasoning paths. To address these challenges, we propose OMHBench, a novel benchmark designed to rigorously evaluate omni-modal multi-hop reasoning. It consists of 6,144 questions with balanced reasoning paths that are jointly grounded across all three modalities. Extensive evaluation of 13 state-of-the-art models reveals that (1) a large performance gap exists between proprietary and open-source MLLMs and (2) even proprietary models exhibit high sensitivity to reasoning path variations, resulting in asymmetric omni-modal grounding. Notably, models struggle when processing the speech modality, underscoring the need for balanced, multi-hop evaluation of omni-modal intelligence.

LLMs will be a commodity

Reddit r/artificial

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Voice Agents in Production: What Actually Works in 2026

Dev.to

How we built a browser-based AI Pathology platform

Dev.to

OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

Key Points

Abstract

Related Articles

LLMs will be a commodity

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Voice Agents in Production: What Actually Works in 2026

How we built a browser-based AI Pathology platform

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer