SAT: Selective Aggregation Transformer for Image Super-Resolution
arXiv cs.CV / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces the Selective Aggregation Transformer (SAT) for image super-resolution, aiming to overcome the high quadratic cost of standard self-attention while preserving long-range dependency modeling.
- SAT selectively aggregates key-value representations to dramatically reduce the number of tokens (reported as a 97% reduction) while keeping the query matrix at full resolution to maintain reconstruction fidelity.
- A Density-driven Token Aggregation algorithm identifies cluster representations using density and isolation metrics to better preserve critical high-frequency image details.
- Experiments report that SAT outperforms the prior state of the art (PFT) by up to 0.22 dB and can cut total FLOPs by up to 27%.
- The approach is positioned as scalable for global interactions, enabling more efficient transformer-based super-resolution without major quality trade-offs.



