Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
arXiv cs.AI / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that current medical vision-language pretraining for chest X-rays is limited because it treats radiographs as context-agnostic and largely ignores radiologists’ gaze patterns used during diagnosis.
- It introduces CoGaze, a context- and gaze-guided pretraining framework that adds a context-infused vision encoder, multi-level semantic alignment objectives, and disease-aware cross-modal priors.
- CoGaze uses radiologists’ gaze as probabilistic priors to guide model attention toward diagnostically salient regions, aiming to better reflect real diagnostic workflows.
- Reported experiments show consistent improvements over state of the art across tasks, including gains in free-text/structured report generation, zero-shot classification AUROC, and image-text retrieval metrics.
- The authors provide code publicly for reproducibility and further experimentation with the CoGaze approach.
Related Articles

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere
Dev.to