FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
arXiv cs.AI / 4/7/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces FeynmanBench, a new benchmark specifically designed to test multimodal LLMs on Feynman-diagram-based physics reasoning rather than only local information extraction.
- The benchmark evaluates multistep capabilities including enforcing conservation laws and symmetries, determining graph topology, translating between diagrammatic and algebraic forms, and constructing scattering amplitudes under defined conventions and gauges.
- An automated pipeline generates diverse Standard Model Feynman diagrams with verifiable topological annotations and corresponding amplitude results, enabling large-scale and reproducible evaluation.
- The dataset covers electromagnetic, weak, and strong interactions, includes 100+ distinct diagram types, and provides 2000+ tasks.
- Experiments show consistent failure modes in leading multimodal LLMs, such as unstable physical-constraint enforcement and incorrect global topological reasoning, underscoring the need for physics-grounded visual reasoning benchmarks.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to