Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports
arXiv cs.AI / 4/22/2026
📰 NewsModels & Research
Key Points
- The study addresses the challenge of accurately classifying diseases from radiology reports, noting that supervised fine-tuning (SFT) can improve accuracy while degrading the quality of reasoning.
- It proposes a two-stage pipeline that first applies SFT using disease labels, then uses Group Relative Policy Optimization (GRPO) to further refine predictions by optimizing for accuracy and output format without explicit reasoning supervision.
- Experiments on three radiologist-annotated datasets show that SFT outperforms baseline methods, and adding GRPO yields additional gains in classification performance.
- The authors report that GRPO also improves aspects of reasoning quality, specifically boosting reasoning recall and comprehensiveness, even though it does not rely on reasoning labels.
- Overall, the work suggests reinforcement learning can mitigate SFT’s trade-off between accuracy and reasoning in domain-specific medical text classification.
Related Articles

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons
Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?
SCMP Tech