LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation
arXiv cs.CV / 3/19/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- LoGSAM proposes a modular, speech-to-segmentation pipeline that converts radiologist dictation into text prompts to drive text-conditioned MRI tumor localization and segmentation.
- The method uses Whisper ASR, negation-aware clinical NLP, and a LoRA-adapted Grounding DINO to generate bounding boxes with only 5% of the parameters updated.
- The predicted bounding boxes are used to prompt MedSAM to produce pixel-level tumor masks without additional fine-tuning, preserving pretrained cross-modal knowledge.
- On BRISC 2025, it achieves a state-of-the-art dice score of 80.32%, and on 12 unseen German dictations yields 91.7% case-level accuracy, indicating strong generalization.
- The work demonstrates a feasible, low-parameter adaptation approach for medical imaging with foundation models, potentially reducing data annotation needs and enabling broader clinician input.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)
Dev.to
Agentforce Builder: How to Build AI Agents in Salesforce
Dev.to
How AI Consulting Services Support Staff Development in Dubai
Dev.to