Grounded Multimodal Retrieval-Augmented Drafting of Radiology Impressions Using Case-Based Similarity Search
arXiv cs.AI / 3/23/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses hallucinations in fully generative radiology report models and proposes a retrieval-augmented generation approach to ground drafts in historical reports.
- It combines multimodal image-text embeddings, case-based similarity retrieval, and citation-constrained draft generation to ensure factual alignment.
- It builds a multimodal retrieval database from a subset of MIMIC-CXR using CLIP for images and structured impressions for text, enabling scalable nearest-neighbor retrieval with FAISS.
- Retrieved cases are used to build grounded prompts with safety mechanisms enforcing citation coverage and confidence-based refusal when uncertain.
- Experimental results show that multimodal fusion improves retrieval performance (Recall@5 > 0.95) and yields interpretable, citation-traceable drafts, enhancing trust in clinical decision support.
Related Articles
Is AI becoming a bubble, and could it end like the dot-com crash?
Reddit r/artificial

Externalizing State
Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA