PET-F2I: A Comprehensive Benchmark and Parameter-Efficient Fine-Tuning of LLMs for PET/CT Report Impression Generation
arXiv cs.CV / 3/12/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The authors introduce PET-F2I-41K (PET Findings-to-Impression Benchmark), a large-scale dataset with over 41,000 real-world PET/CT reports for generating diagnostic impressions.
- They evaluate 27 models, spanning frontier LLMs, open-source generalist models, and medical-domain LLMs, and find zero-shot performance is inadequate.
- They train a domain-adapted 7B model, PET-F2I-7B, by fine-tuning Qwen2.5-7B-Instruct with LoRA, achieving BLEU-4 of 0.708 and a 3x improvement in entity coverage over the strongest baseline.
- They introduce three clinically grounded metrics—Entity Coverage Rate, Uncovered Entity Rate, and Factual Consistency Rate—to measure diagnostic completeness and factual reliability alongside standard NLG metrics.
- The work highlights advantages in cost, latency, and privacy for PET/CT reporting and provides a standardized evaluation framework to accelerate development of reliable clinical reporting systems.
Related Articles

I built an online background remover and learned a lot from launching it
Dev.to
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to
WordPress Theme Customization Without Code: The AI Revolution
Dev.to