Fine-Tuning Small Reasoning Models for Quantum Field Theory

arXiv cs.LG / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper presents the first academic fine-tuning study targeting small (~7B) reasoning models specifically for theoretical physics, focusing on how domain reasoning abilities develop during training.
  • Because open-source, verifiable physics training data is scarce, the authors build a robust data generation pipeline that creates synthetic QFT problems and adapts existing human-authored problems for model training.
  • They generate 2,500+ synthetic Quantum Field Theory problems and compile a curated set of human-adapted problems from arXiv and pedagogy sources.
  • Experiments compare Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT), evaluating both performance improvements and generalization to other physics domains.
  • The study includes a detailed before/after analysis of chain-of-thought reasoning (including error evolution) and releases the pipeline, verifiable QFT training data, and ~200M tokens of QFT reasoning traces publicly.

Abstract

Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and \sim200M tokens of QFT reasoning traces.