GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways
arXiv cs.CV / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that fixed residual connections in deep ConvNets can limit learning because they cannot adapt gradient flow or feature emphasis to input complexity and task relevance across depth.
- It proposes GradAttn, a hybrid CNN–transformer approach that replaces fixed residual shortcuts with self-attention–controlled gradient flow using multi-scale CNN features.
- Experiments on eight datasets (including natural images, medical imaging, and fashion recognition) show GradAttn variants outperform ResNet-18 on five datasets, with up to an +11.07% accuracy gain on FashionMNIST while keeping comparable model size.
- Gradient flow analysis suggests that some attention-induced controlled instabilities may correlate with better generalization, contradicting the idea that maximal stability is always optimal.
- The study also finds positional encoding effectiveness is dataset-dependent, with CNN hierarchies sometimes providing sufficient spatial structure on their own.



