GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces GeoTikzBridge, a framework that improves multimodal LLMs’ fine-grained geometric perception and visual reasoning by generating TikZ-based code.
  • It presents two models—GeoTikzBridge-Base trained on a 2.5M-pair GeoTikz-Base dataset built via iterative expansion, and GeoTikzBridge-Instruct fine-tuned on a first-of-its-kind instruction-augmented GeoTikz-Instruct dataset for visual reasoning.
  • Experiments report state-of-the-art performance among open-sourced multimodal LLMs on geometric-related tasks, specifically addressing limitations in capturing detailed geometric structure.
  • The authors claim the GeoTikzBridge models can be used as plug-and-play reasoning modules with other MLLMs to boost geometric problem-solving performance.
  • The associated datasets and code are released publicly via GitHub, enabling external replication and downstream integration.

Abstract

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. The GeoTikzBridge-Base model is trained on GeoTikz-Base dataset, the largest image-to-tikz dataset to date with 2.5M pairs (16 \times larger than existing open-sourced datasets). This process is achieved via iterative data expansion and a localized geometric transformation strategy. Subsequently, GeoTikzBridge-Instruct is fine-tuned on GeoTikz-Instruct dataset which is the first instruction-augmented tikz dataset supporting visual reasoning. Extensive experimental results demonstrate that our models achieve state-of-the-art performance among open-sourced MLLMs. Furthermore, GeoTikzBridge models can serve as plug-and-play reasoning modules for any MLLM(LLM), enhancing reasoning performance in geometric problem-solving. Datasets and codes are publicly available at: https://github.com/sjy-1995/GeoTikzBridge-Advancing-Multimodal-Code-Generation-for-Geometric-Perception-and-Reasoning.