Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

arXiv cs.AI / 4/22/2026

📰 NewsModels & Research

Key Points

  • The study addresses the challenge of accurately classifying diseases from radiology reports, noting that supervised fine-tuning (SFT) can improve accuracy while degrading the quality of reasoning.
  • It proposes a two-stage pipeline that first applies SFT using disease labels, then uses Group Relative Policy Optimization (GRPO) to further refine predictions by optimizing for accuracy and output format without explicit reasoning supervision.
  • Experiments on three radiologist-annotated datasets show that SFT outperforms baseline methods, and adding GRPO yields additional gains in classification performance.
  • The authors report that GRPO also improves aspects of reasoning quality, specifically boosting reasoning recall and comprehensiveness, even though it does not rely on reasoning labels.
  • Overall, the work suggests reinforcement learning can mitigate SFT’s trade-off between accuracy and reasoning in domain-specific medical text classification.

Abstract

Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed baselines and GRPO further improved classification and enhanced reasoning recall and comprehensiveness.