AI Navigate

Beyond Convolution: A Taxonomy of Structured Operators for Learning-Based Image Processing

arXiv cs.CV / 3/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a systematic taxonomy of operators that extend or replace standard convolution in learning-based image processing, organizing them into five families: decomposition-based, adaptive weighted, basis-adaptive, integral/kernel, and attention-based operators.
  • For each family, it provides formal definitions, analyzes how each differs from convolution in terms of structure, and discusses which tasks (image-to-image vs image-to-label) each is best suited for.
  • It offers a comparative analysis across dimensions such as linearity, locality, equivariance, computational cost, and outlines open challenges and future directions.
  • The article positions these alternatives as a guide for researchers and practitioners to rethink model design beyond fixed convolutions, potentially enabling more expressive and adaptable image-processing pipelines.

Abstract

The convolution operator is the fundamental building block of modern convolutional neural networks (CNNs), owing to its simplicity, translational equivariance, and efficient implementation. However, its structure as a fixed, linear, locally-averaging operator limits its ability to capture structured signal properties such as low-rank decompositions, adaptive basis representations, and non-uniform spatial dependencies. This paper presents a systematic taxonomy of operators that extend or replace the standard convolution in learning-based image processing pipelines. We organise the landscape of alternative operators into five families: (i) decomposition-based operators, which separate structural and noise components through singular value or tensor decompositions; (ii) adaptive weighted operators, which modulate kernel contributions as a function of spatial position or signal content; (iii) basis-adaptive operators, which optimise the analysis bases together with the network weights; (iv) integral and kernel operators, which generalise the convolution to position-dependent and non-linear kernels; and (v) attention-based operators, which relax the locality assumption entirely. For each family, we provide a formal definition, a discussion of its structural properties with respect to the convolution, and a critical analysis of the tasks for which the operator is most appropriate. We further provide a comparative analysis of all families across relevant dimensions -- linearity, locality, equivariance, computational cost, and suitability for image-to-image and image-to-label tasks -- and outline the open challenges and future directions of this research area.