Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning
arXiv cs.AI / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces GeoAux-Bench, a geometry benchmark consisting of 4,334 problems that aligns textual construction steps with corresponding visual updates.
- It shows that interleaved visual-textual aids outperform single-modality approaches by preserving geometric synergy and reducing reasoning perplexity.
- It proposes Action Applicability Policy Optimization (A2PO), a reinforcement learning framework with Adaptive Reward Shaping and counterfactual sampling to regulate when and how visual aids are used.
- Experiments report a 3.51% performance gain over strong baselines, and code and data are released on GitHub.
Related Articles
Is AI becoming a bubble, and could it end like the dot-com crash?
Reddit r/artificial

Externalizing State
Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA