Fine-Tuning Small Reasoning Models for Quantum Field Theory

arXiv cs.LG / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper presents the first academic fine-tuning study targeting small (~7B) reasoning models specifically for theoretical physics, focusing on how domain reasoning abilities develop during training.
Because open-source, verifiable physics training data is scarce, the authors build a robust data generation pipeline that creates synthetic QFT problems and adapts existing human-authored problems for model training.
They generate 2,500+ synthetic Quantum Field Theory problems and compile a curated set of human-adapted problems from arXiv and pedagogy sources.
Experiments compare Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT), evaluating both performance improvements and generalization to other physics domains.
The study includes a detailed before/after analysis of chain-of-thought reasoning (including error evolution) and releases the pipeline, verifiable QFT training data, and ~200M tokens of QFT reasoning traces publicly.

Abstract

Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and

\sim

200M tokens of QFT reasoning traces.

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Now Meta will track what employees do on their computers to train its AI agents

The Verge

Fine-Tuning Small Reasoning Models for Quantum Field Theory

Key Points

Abstract

Related Articles

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Now Meta will track what employees do on their computers to train its AI agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer