Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting

arXiv cs.LG / 4/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses limited high-quality annotated data for vision-language medical report generation by proposing a weighted loss to improve sample efficiency.
Instead of treating all token prediction errors equally, the reweighted objective emphasizes semantically and clinically salient tokens.
Experiments on ophthalmological report generation show that the token reweighting approach can reach similar report quality while using up to 10× less training data.
The findings suggest a simple training objective change can improve efficiency across different data scales without requiring fundamentally new model architectures.

Abstract

Training vision-language models (VLMs) for medical report generation is often hindered by the scarcity of high-quality annotated data. This work evaluates the use of a weighted loss function to improve data efficiency. Compared to standard cross-entropy loss, which treats all token prediction errors equally, the reweighted loss shifts the focus to semantically salient tokens with outsized clinical importance. In experiments on ophthalmological report generation, we show that this simple method improves efficiency across multiple data scales, achieving similar report quality with up to ten times less training data.