Weakly supervised multimodal segmentation of acoustic borehole images with depth-aware cross-attention

arXiv cs.CV / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a weakly supervised multimodal segmentation framework for acoustic borehole images that leverages depth-aligned well-log data to compensate for scarce dense expert annotations.
It improves traditional threshold/clustering pseudo-label workflows by adding denoising, confidence-aware pseudo-supervision, and physically structured fusion while keeping the overall annotation-free character.
Experiments show that learned refinement of threshold-guided pseudo-labels delivers the most robust improvement over raw thresholding, denoised thresholding, and latent clustering baselines.
Fusion strategy is critical: simple direct concatenation yields limited gains, while depth-aware cross-attention, gated fusion, and confidence-aware modulation substantially improve alignment with the weak supervisory reference.
The best-performing model, confidence-gated depth-aware cross-attention (CG-DCA), consistently outperforms threshold-based, image-only, and prior multimodal baselines, with ablations indicating the gains come from confidence-aware structured local depth interaction rather than sheer model complexity.

Abstract

Acoustic borehole images provide high-resolution borehole-wall structure, but large-scale interpretation remains difficult because dense expert annotations are rarely available and subsurface information is intrinsically multimodal. The challenge is developing weakly supervised methods combining two-dimensional image texture with depth-aligned one-dimensional well-logs. Here, we introduce a weakly supervised multimodal segmentation framework that refines threshold-guided pseudo-labels through learned models. This preserves the annotation-free character of classical thresholding and clustering workflows while extending them with denoising, confidence-aware pseudo-supervision, and physically structured fusion. We establish that threshold-guided learned refinement provides the most robust improvement over raw thresholding, denoised thresholding, and latent clustering baselines. Multimodal performance depends strongly on fusion strategy: direct concatenation provides limited gains, whereas depth-aware cross-attention, gated fusion, and confidence-aware modulation substantially improve agreement with the weak supervisory reference. The strongest model, confidence-gated depth-aware cross-attention (CG-DCA), consistently outperforms threshold-based, image-only, and earlier multimodal baselines. Targeted ablations show its advantage depends specifically on confidence-aware fusion and structured local depth interaction rather than model complexity alone. Cross-well analyses confirm this performance is broadly stable. These results establish a practical, scalable framework for annotation-free segmentation, showing multimodal improvement is maximized when auxiliary logs are incorporated selectively and depth-aware.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

Data Sovereignty Rules and Enterprise AI

Dev.to

Weakly supervised multimodal segmentation of acoustic borehole images with depth-aware cross-attention

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Data Sovereignty Rules and Enterprise AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer