Empowering Semantic-Sensitive Underwater Image Enhancement with VLM
arXiv cs.AI / 3/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The work addresses distribution shifts between high-quality enhanced underwater images and natural images that hinder semantic cue extraction for downstream vision tasks in underwater image enhancement (UIE).
- It proposes a learning mechanism that uses Vision-Language Models to generate textual descriptions of key objects from a degraded image and a text-image alignment model to map these descriptions back onto the image, creating a spatial semantic guidance map.
- This semantic guidance map steers the UIE network through a dual-guidance mechanism that combines cross-attention and an explicit alignment loss, focusing restoration on semantically important regions.
- Experiments show that applying the strategy to different UIE baselines significantly boosts perceptual quality metrics and improves performance on detection and segmentation tasks, demonstrating adaptability across models.




