Adaptive Transform Coding for Semantic Compression

arXiv cs.CV / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses semantic (feature/embedding) compression for vision tasks, shifting from reconstructing images for humans to transmitting compact machine-oriented representations for downstream inference.
  • It proposes an adaptive transform-coding approach for semantic-feature compression based on the conditional rate-distortion function of a Gaussian mixture model.
  • The method selects mode-dependent transforms and quantizers according to the inferred source component, improving efficiency for heterogeneous feature distributions.
  • Experiments on features from common vision backbones and foundation models indicate the approach achieves better-than or competitive results versus state-of-the-art neural compression methods while remaining flexible and interpretable.

Abstract

Visual data compression is shifting from human-centered reconstruction to machine-oriented representation coding. In this setting, an image is often mapped to a compact semantic embedding, which is then compressed and transmitted for downstream inference. We propose an adaptive transform-coding method for semantic-feature compression motivated by the conditional rate-distortion function of a Gaussian mixture model. The scheme uses mode-dependent transforms and quantizers selected according to the inferred source component, enabling more efficient coding of heterogeneous feature distributions. Evaluations on features from widely used vision backbones and foundation models show that the proposed method outperforms or is competitive with state-of-the-art neural compression methods while preserving flexibility and interpretability.