Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

arXiv cs.LG / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Evidence-aware Self-Correcting Reinforcement Learning (ESC-RL) to improve radiology report generation by addressing weak, report-level reward signals for clinical faithfulness.
ESC-RL introduces GEAR (Group-wise Evidence-aware Alignment Reward), which provides group-wise feedback to reinforce true positives, recover false negatives, and suppress unsupported false positives.
It also adds SPL (Self-correcting Preference Learning) that builds a disease-aware preference dataset from multiple noisy observations and uses an LLM to synthesize refined reports without human supervision.
Experiments on two public chest X-ray datasets show consistent performance improvements and state-of-the-art results, suggesting ESC-RL yields more evidence-grounded and preference-aligned outputs during training.

Abstract

Recent reinforcement learning (RL) approaches have advanced radiology report generation (RRG), yet two core limitations persist: (1) report-level rewards offer limited evidence-grounded guidance for clinical faithfulness; and (2) current methods lack an explicit self-improving mechanism to align with clinical preference. We introduce clinically aligned Evidence-aware Self-Correcting Reinforcement Learning (ESC-RL), comprising two key components. First, a Group-wise Evidence-aware Alignment Reward (GEAR) delivers group-wise, evidence-aware feedback. GEAR reinforces consistent grounding for true positives, recovers missed findings for false negatives, and suppresses unsupported content for false positives. Second, a Self-correcting Preference Learning (SPL) strategy automatically constructs a reliable, disease-aware preference dataset from multiple noisy observations and leverages an LLM to synthesize refined reports without human supervision. ESC-RL promotes clinically faithful, disease-aligned reward and supports continual self-improvement during training. Extensive experiments on two public chest X-ray datasets demonstrate consistent gains and state-of-the-art performance.

The AI Hype Cycle Is Lying to You About What to Learn

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inside NVIDIA’s $2B Marvell Deal: What NVLink Fusion Means for AI Ethernet Fabrics

Dev.to

Automating Your Literature Review: From PDFs to Data with AI

Dev.to

Why event-driven agents reduce scope, cost, and decision dispersion

Dev.to

Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

Key Points

Abstract

Related Articles

The AI Hype Cycle Is Lying to You About What to Learn

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inside NVIDIA’s $2B Marvell Deal: What NVLink Fusion Means for AI Ethernet Fabrics

Automating Your Literature Review: From PDFs to Data with AI

Why event-driven agents reduce scope, cost, and decision dispersion

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer