PET-F2I: A Comprehensive Benchmark and Parameter-Efficient Fine-Tuning of LLMs for PET/CT Report Impression Generation
arXiv cs.CV / 3/12/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The authors introduce PET-F2I-41K (PET Findings-to-Impression Benchmark), a large-scale dataset with over 41,000 real-world PET/CT reports for generating diagnostic impressions.
- They evaluate 27 models, spanning frontier LLMs, open-source generalist models, and medical-domain LLMs, and find zero-shot performance is inadequate.
- They train a domain-adapted 7B model, PET-F2I-7B, by fine-tuning Qwen2.5-7B-Instruct with LoRA, achieving BLEU-4 of 0.708 and a 3x improvement in entity coverage over the strongest baseline.
- They introduce three clinically grounded metrics—Entity Coverage Rate, Uncovered Entity Rate, and Factual Consistency Rate—to measure diagnostic completeness and factual reliability alongside standard NLG metrics.
- The work highlights advantages in cost, latency, and privacy for PET/CT reporting and provides a standardized evaluation framework to accelerate development of reliable clinical reporting systems.
Related Articles

Astral to Join OpenAI
Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic
Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.
Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA