Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations

arXiv cs.AI / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes visual-to-symbolic analytical solution inference (ViSA), aiming to recover a single executable SymPy expression for 2D linear steady-state physical fields from visualizations (and first-order derivatives) plus minimal metadata.
  • It introduces ViSA-R2, which uses a self-verifying, solution-centric reasoning pipeline that hypothesizes solution-family (ansatz) structures, derives parameters, and checks consistency in a physicist-like workflow.
  • The authors release ViSA-Bench, a VLM-ready synthetic benchmark with 30 linear steady-state scenarios and verifiable symbolic/analytical annotations.
  • Evaluation uses multiple metrics—numerical accuracy, expression-structure similarity, and character-level accuracy—and shows ViSA-R2 (with an 8B open-weight Qwen3-VL backbone) outperforming strong open-source baselines and several closed-source frontier VLMs under a standardized protocol.

Abstract

Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored capability for AI-assisted scientific reasoning. We study visual-to-symbolic analytical solution inference (ViSA) for two-dimensional linear steady-state fields: given field visualizations (and first-order derivatives) plus minimal auxiliary metadata, the model must output a single executable SymPy expression with fully instantiated numeric constants. We introduce ViSA-R2 and align it with a self-verifying, solution-centric chain-of-thought pipeline that follows a physicist-like pathway: structural pattern recognition solution-family (ansatz) hypothesis parameter derivation consistency verification. We also release ViSA-Bench, a VLM-ready synthetic benchmark covering 30 linear steady-state scenarios with verifiable analytical/symbolic annotations, and evaluate predictions by numerical accuracy, expression-structure similarity, and character-level accuracy. Using an 8B open-weight Qwen3-VL backbone, ViSA-R2 outperforms strong open-source baselines and the evaluated closed-source frontier VLMs under a standardized protocol.