Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning
arXiv cs.AI / 4/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that vision-language models for chest X-rays often fall short because they optimize for semantic correctness without mirroring how radiologists visually inspect and reason over images.
- It introduces GazeX, which uses radiologists’ eye-tracking data as a behavioral prior by incorporating gaze trajectories and fixation patterns into pretraining.
- GazeX is trained on a curated dataset with gaze key frames from five radiologists and evaluated using large-scale radiology study, QA, and caption/bounding-box datasets.
- Results claim that GazeX improves accuracy, interpretability, and consistency with expert diagnostic workflows across report generation, disease grounding, and visual question answering.
- Unlike fully autonomous systems, GazeX is designed to output verifiable evidence artifacts such as inspection trajectories and localized findings to support safer human–AI collaboration.


![[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Flu4b6ttuhur71z5gemm0.png&w=3840&q=75)
