Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening

arXiv cs.CV / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a Semantic-Topological Graph Reasoning (STGR) framework for language-guided pulmonary screening that targets ambiguity in clinical text and overlapping anatomical structures in low-contrast scans.
  • STGR combines a large language model (LLaMA-3-V) for reasoning with a vision foundation model (MedSAM) for zero-shot mask delineation, using a Text-to-Vision Intent Distillation (TVID) module to extract diagnostic guidance from free text.
  • It formulates mask selection as a dynamic graph reasoning task, representing candidate lesions as nodes and using spatial/semantic edges to disambiguate complex anatomy.
  • To reduce overfitting on limited medical data while supporting deployment, the authors introduce Selective Asymmetric Fine-Tuning (SAFT), updating fewer than 1% of model parameters.
  • Experiments with 5-fold cross-validation on LIDC-IDRI and LNDb report a new state of the art, including 81.5% Dice Similarity Coefficient on LIDC-IDRI, with improved over LLM-based baselines and strong cross-fold stability.

Abstract

Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariably leads to severe overfitting. To address these challenges, we propose a novel Semantic-Topological Graph Reasoning (STGR) framework for language-guided pulmonary screening. Our approach elegantly synergizes the reasoning capabilities of large language models (LLaMA-3-V) with the zero-shot delineation of vision foundation models (MedSAM). Specifically, we introduce a Text-to-Vision Intent Distillation (TVID) module to extract precise diagnostic guidance. To resolve anatomical ambiguity, we formulate mask selection as a dynamic graph reasoning problem, where candidate lesions are modeled as nodes and edges capture spatial and semantic affinities. To ensure deployment feasibility, we introduce a Selective Asymmetric Fine-Tuning (SAFT) strategy that updates less than 1% of the parameters. Rigorous 5-fold cross-validation on the LIDC-IDRI and LNDb datasets demonstrates that our framework establishes a new state-of-the-art. Notably, it achieves an 81.5% Dice Similarity Coefficient (DSC) on LIDC-IDRI, outperforming leading LLM-based tools like LISA by over 5%. Crucially, our SAFT strategy acts as a powerful regularizer, yielding exceptional cross-fold stability (0.6% DSC variance) and paving the way for robust, context-aware clinical deployment.