DeltaSeg: Tiered Attention and Deep Delta Learning for Multi-Class Structural Defect Segmentation

arXiv cs.CV / 4/22/2026

📰 NewsModels & Research

共有:

Key Points

DeltaSeg is a U-shaped encoder–decoder model designed to improve multi-class structural defect segmentation from inspection imagery despite class imbalance and the need for accurate boundary delineation.
The architecture uses tiered attention at multiple stages (SE channel attention in the encoder, Coordinate Attention at the bottleneck and decoder, and a Deep Delta Attention mechanism in skip connections) to suppress nuisance features and enhance spatial focus.
DeltaSeg incorporates depthwise separable convolutions with dilated stages to preserve spatial resolution while increasing the receptive field, and uses ASPP at the bottleneck for multi-scale context.
It applies deep supervision with multi-scale auxiliary heads to strengthen training and promote semantically meaningful intermediate representations.
On the S2DS (7 classes) and CSDD (9 classes) datasets, DeltaSeg outperforms 12 baseline/alternative segmentation architectures, showing robust generalization across damage types, imaging conditions, and structural geometries.

Abstract

Automated segmentation of structural defects from visual inspection imagery remains challenging due to the diversity of damage types, extreme class imbalance, and the need for precise boundary delineation. This paper presents DeltaSeg, a U-shaped encoder-decoder architecture with a tiered attention strategy that integrates Squeeze-and-Excitation (SE) channel attention in the encoder, Coordinate Attention at the bottleneck and decoder, and a novel Deep Delta Attention (DDA) mechanism in the skip connections. The encoder uses depthwise separable convolutions with dilated stages to maintain spatial resolution while expanding the receptive field. Atrous Spatial Pyramid Pooling (ASPP) at the bottleneck captures multi-scale context. The DDA module refines skip connections through a dual-path scheme combining a learned delta operator for nuisance feature suppression with spatial attention gates conditioned on decoder signals. Deep supervision through multi-scale auxiliary heads further strengthens gradient flow and encourages semantically meaningful features at intermediate decoder stages. We evaluate DeltaSeg on two datasets: the S2DS dataset (7 classes) and the Culvert-Sewer Defect Dataset (CSDD, 9 classes). Across both benchmarks, DeltaSeg consistently outperforms 12 competing architectures including U-Net, SA-UNet, UNet3+, SegFormer, Swin-UNet, EGE-UNet, FPN, and Mobile-UNETR, demonstrating strong generalization across damage types, imaging conditions, and structural geometries.