LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation
arXiv cs.CV / 3/19/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- LoGSAM proposes a modular, speech-to-segmentation pipeline that converts radiologist dictation into text prompts to drive text-conditioned MRI tumor localization and segmentation.
- The method uses Whisper ASR, negation-aware clinical NLP, and a LoRA-adapted Grounding DINO to generate bounding boxes with only 5% of the parameters updated.
- The predicted bounding boxes are used to prompt MedSAM to produce pixel-level tumor masks without additional fine-tuning, preserving pretrained cross-modal knowledge.
- On BRISC 2025, it achieves a state-of-the-art dice score of 80.32%, and on 12 unseen German dictations yields 91.7% case-level accuracy, indicating strong generalization.
- The work demonstrates a feasible, low-parameter adaptation approach for medical imaging with foundation models, potentially reducing data annotation needs and enabling broader clinician input.
Related Articles
Astral to Join OpenAI
Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic
Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.
Dev.to
ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA