Semantically Stable Image Composition Analysisvia Saliency and Gradient Vector Flow Fusion

arXiv cs.CV / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that photographic composition can be modeled as a flow of visual attention over geometric structure, enabling a representation that is discriminative of layout while remaining robust to semantics.
  • It introduces VFCNet, which fuses saliency and edge cues into a gradient vector field via gradient vector flow (GVF), using a dual-stream GVF design integrated with attention.
  • VFCNet extracts multi-scale flow features using a DINOv3 backbone and attains state-of-the-art results on the PICD benchmark, improving CDA-1 and CDA-2 by 33.1% and 36.1% over the prior best method.
  • The authors show that even a simple classifier built on self-supervised DINOv3 features can outperform more composition-specialized models, highlighting the strength of general-purpose representations.
  • The work is released with accompanying code on GitHub, supporting reproducibility and further experimentation.

Abstract

The reliable computational assessment of photographic composition requires features that are discriminative of spatial layout yet robust to semantic content. This paper proposes a low-level representation grounded in the assumption that composition can be understood as the flow of visual attention across geometric structure. We introduce VFCNet, which fuses saliency and edge information into a gradient vector flow (GVF) field. The model computes dual-stream GVF representations, integrates them via attention, and extracts multi-scale flow features with a DINOv3 backbone. VFCNet achieves state-of-the-art performance on the PICD benchmark (CDA-1: 0.683, CDA-2: 0.629), improving by 33.1\% and 36.1\% over the previous best method. We also show that a simple classifier on self-supervised DINOv3 features substantially outperforms more sophisticated, composition-specialized models. Code is available at https://github.com/ADadras/VFCNet