HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning

arXiv cs.CL / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces HyperGVL, the first benchmark designed to evaluate large vision-language models (LVLMs) on hypergraph understanding and reasoning tasks.
  • HyperGVL tests 12 advanced LVLMs using 84,000 vision-language QA samples across 12 tasks, from simple counting to complex NP-hard style reasoning.
  • The benchmark includes both multiscale synthetic hypergraph structures and real-world citation and protein networks to better reflect practical hypergraph settings.
  • The authors study how 12 different textual and visual hypergraph representations affect performance, and propose WiseHyGR, a generalizable router that learns adaptive hypergraph representations to improve LVLM results.
  • Overall, the work aims to clarify LVLM capability boundaries on hypergraphs and strengthen the connection between hypergraph modeling and vision-language reasoning.

Abstract

Large Vision-Language Models (LVLMs) consistently require new arenas to guide their expanding boundaries, yet their capabilities with hypergraphs remain unexplored. In the real world, hypergraphs have significant practical applications in areas such as life sciences and social communities. Recent advancements in LVLMs have shown promise in understanding complex topologies, yet there remains a lack of a benchmark to delineate the capabilities of LVLMs with hypergraphs, leaving the boundaries of their abilities unclear. To fill this gap, in this paper, we introduce \texttt{HyperGVL}, the first benchmark to evaluate the proficiency of LVLMs in hypergraph understanding and reasoning. \texttt{HyperGVL} provides a comprehensive assessment of 12 advanced LVLMs across 84,000 vision-language question-answering (QA) samples spanning 12 tasks, ranging from basic component counting to complex NP-hard problem reasoning. The involved hypergraphs contain multiscale synthetic structures and real-world citation and protein networks. Moreover, we examine the effects of 12 textual and visual hypergraph representations and introduce a generalizable router \texttt{WiseHyGR} that improves LVLMs in hypergraph via learning adaptive representations. We believe that this work is a step forward in connecting hypergraphs with LVLMs.