Analogical Reasoning as a Doctor: A Foundation Model for Gastrointestinal Endoscopy Diagnosis

arXiv cs.CV / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces RATNet, a foundation model for gastrointestinal endoscopy imaging that targets common limitations of existing AI systems such as poor generalizability and robustness under domain shift and heterogeneous annotations.
  • RATNet uses cyclic pre-training to learn from and transfer knowledge across five GI endoscopy datasets with expert annotations, supporting fine-tuning, linear probing, and zero-shot transfer.
  • The model’s architecture combines an encoder with a relevance-knowledge acquisition and transfer (RAT) module plus a multi-task head, using an analogical reasoning mechanism that matches image-derived posterior knowledge to a learned prior knowledge base.
  • Experiments report RATNet outperforming prior foundation models (e.g., GastroNet and GastroVision) across multiple settings, including few-shot rare-disease diagnosis, zero-shot transfer to new medical sites, long-tailed distributions, and adaptation to novel diseases.
  • The authors also claim practical deployment benefits: the approach can automatically integrate heterogeneous annotations without manual label unification, lowers data acquisition costs, and enables privacy-preserving use via federated learning.

Abstract

Gastrointestinal diseases impose a growing global health burden, and endoscopy is a primary tool for early diagnosis. However, routine endoscopic image interpretation still suffers from missed lesions and limited efficiency. Although AI-assisted diagnosis has shown promise, existing models often lack generalizability, adaptability, robustness, and scalability because of limited medical data, domain shift, and heterogeneous annotations. To address these challenges, we develop RATNet, a foundation model for gastrointestinal endoscopy imaging based on analogical reasoning. RATNet acquires and transfers knowledge from heterogeneous expert annotations across five gastrointestinal endoscopy datasets through a cyclic pre-training strategy. Its architecture consists of an encoder, a relevance-knowledge acquisition and transfer (RAT) module, a projector, and a multi-task head, and supports fine-tuning, linear probing, and zero-shot transfer. Evaluations show that RATNet outperforms existing foundation models, including GastroNet and GastroVision, across six scenarios: diagnosis of common gastrointestinal diseases, few-shot learning for rare diseases, zero-shot transfer to new medical sites, robustness under long-tailed disease distributions, adaptation to novel diseases, and privacy-preserving deployment via federated learning. Its advantage comes from an analogical reasoning mechanism that matches image-derived posterior knowledge to a learned prior knowledge base and transfers relative knowledge to guide diagnosis, improving generalization and resistance to bias. RATNet is open and cost-effective, supports automatic integration of heterogeneous annotations without manual label unification, and reduces data acquisition costs, making it a practical foundation for intelligent gastrointestinal diagnosis, especially in resource-limited settings.