HighlightBench: Benchmarking Markup-Driven Table Reasoning in Scientific Documents
arXiv cs.CV / 3/31/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces HighlightBench, a diagnostic benchmark focused on how well multimodal LLMs interpret visual markup cues (e.g., highlights, underlines, bold) as logical directives for reasoning over scientific tables.
- It addresses a key evaluation blind spot by separating failures due to “markup not being seen” versus failures in “reasoning with the markup,” using five task families.
- The benchmark includes Markup Grounding, Constrained Retrieval, Local Relations, Aggregation & Comparison, and Consistency & Missingness to cover both perception and structured table reasoning behaviors.
- A reference pipeline is provided that makes intermediate decisions explicit, enabling more reproducible baselines and more granular error attribution across the perception-to-execution chain.
- Experimental results indicate that even strong models can be unstable when visual cues must be consistently aligned with symbolic reasoning under structured output constraints.
Related Articles
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK
Dev.to

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization
Dev.to