Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning
arXiv cs.CV / 4/3/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that current large vision-language models often fail when images act as clues and the answer depends on multi-step cognitive reasoning beyond explicit visual recognition.
- It introduces RebusBench, a benchmark containing 1,164 rebus puzzles designed to test neurosymbolic capability by requiring perception-to-language attribute extraction, idiom/linguistic prior retrieval, and abstract mapping to generate meaning outside pixel space.
- Evaluations on models such as Qwen, InternVL, and LLaVA show severe limitations, with results saturating below 10% Exact Match and 20% semantic accuracy.
- The authors report no significant gains from model scaling or in-context learning, suggesting missing “reasoning glue” rather than missing raw visual or linguistic components.
- The work positions rebus-style tasks as a diagnostic for integration of visual understanding with external knowledge and systematic reasoning.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to