An Explainable Vision-Language Model Framework with Adaptive PID-Tversky Loss for Lumbar Spinal Stenosis Diagnosis
arXiv cs.AI / 4/6/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses lumbar spinal stenosis (LSS) diagnosis from multi-view MRI, targeting the delays and inter-observer variability caused by labor-intensive manual interpretation.
- It proposes an end-to-end explainable vision-language model that uses a Spatial Patch Cross-Attention module for text-directed, spatially precise localization of spinal anomalies.
- It introduces an Adaptive PID-Tversky Loss that applies control-theory-inspired, dynamically adjusted penalties to better handle extreme class imbalance and under-segmented minority instances.
- The approach combines foundational VLMs with an Automated Radiology Report Generation module to improve interpretability, translating segmentation outputs into radiologist-style clinical reports.
- Reported results include 90.69% classification accuracy, a macro-averaged Dice score of 0.9512 for segmentation, and a CIDEr score of 92.80%, alongside claims of a new benchmark for transparent, supervised clinical AI.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




