Learning to Trade Like an Expert: Cognitive Fine-Tuning for Stable Financial Reasoning in Language Models

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies whether large language models used as autonomous trading agents can generalize their financial decision-making beyond narrow market patterns and noisy, ground-truth-scarce settings.
  • It proposes a structured training and evaluation framework centered on a curated multiple-choice question dataset from classic textbooks and historical markets, verified by an AI committee and augmented with reasoning traces to reduce shortcut learning.
  • The authors introduce a two-stage evaluation protocol that tests isolated MCQ performance and then measures generalization via an MCQ-based chronological trading simulation.
  • Experiments across multiple market regimes show that open models trained with the framework can deliver competitive, risk-aware behavior over time and outperform open-source baselines while nearing frontier-model performance at smaller scales.
  • The dataset and evaluation framework are released to enable follow-on research on training and assessing LLM-based financial reasoning.

Abstract

Recent deployments of large language models (LLMs) as autonomous trading agents raise questions about whether financial decision-making competence generalizes beyond specific market patterns and how it should be trained and evaluated in noisy markets lacking ground truth. We propose a structured framework for training and evaluating such models. Central to our approach is a curated, multiple-choice question (MCQ) dataset derived from classic textbooks and historical markets, verified by an AI committee, enriched with structured reasoning traces, and augmented to reduce shortcut learning. To evaluate whether performance on isolated MCQs generalizes to real-world trading, we introduce a two-stage protocol combining test-set evaluation with an MCQ-based chronological trading simulation. Extensive evaluations across market regimes provide statistically robust evidence that open models trained with our framework exhibit competitive, risk-aware behavior over time, outperform open-source baselines, and approach frontier-model performance at smaller scale. We release the dataset and evaluation framework to support further research.