SAT: Selective Aggregation Transformer for Image Super-Resolution

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces the Selective Aggregation Transformer (SAT) for image super-resolution, aiming to overcome the high quadratic cost of standard self-attention while preserving long-range dependency modeling.
SAT selectively aggregates key-value representations to dramatically reduce the number of tokens (reported as a 97% reduction) while keeping the query matrix at full resolution to maintain reconstruction fidelity.
A Density-driven Token Aggregation algorithm identifies cluster representations using density and isolation metrics to better preserve critical high-frequency image details.
Experiments report that SAT outperforms the prior state of the art (PFT) by up to 0.22 dB and can cut total FLOPs by up to 27%.
The approach is positioned as scalable for global interactions, enabling more efficient transformer-based super-resolution without major quality trade-offs.

Abstract

Transformer-based approaches have revolutionized image super-resolution by modeling long-range dependencies. However, the quadratic computational complexity of vanilla self-attention mechanisms poses significant challenges, often leading to compromises between efficiency and global context exploitation. Recent window-based attention methods mitigate this by localizing computations, but they often yield restricted receptive fields. To mitigate these limitations, we propose Selective Aggregation Transformer (SAT). This novel transformer efficiently captures long-range dependencies, leading to an enlarged model receptive field by selectively aggregating key-value matrices (reducing the number of tokens by 97\%) via our Density-driven Token Aggregation algorithm while maintaining the full resolution of the query matrix. This design significantly reduces computational costs, resulting in lower complexity and enabling scalable global interactions without compromising reconstruction fidelity. SAT identifies and represents each cluster with a single aggregation token, utilizing density and isolation metrics to ensure that critical high-frequency details are preserved. Experimental results demonstrate that SAT outperforms the state-of-the-art method PFT by up to 0.22dB, while the total number of FLOPs can be reduced by up to 27\%.

Black Hat Asia

AI Business

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

The $50,000 Build with MeDo Hackathon is NOW LIVE!

Dev.to

SAT: Selective Aggregation Transformer for Image Super-Resolution

Key Points

Abstract

Related Articles

Black Hat Asia

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

The $50,000 Build with MeDo Hackathon is NOW LIVE!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer