Semantically Stable Image Composition Analysisvia Saliency and Gradient Vector Flow Fusion

arXiv cs.CV / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper argues that photographic composition can be modeled as a flow of visual attention over geometric structure, enabling a representation that is discriminative of layout while remaining robust to semantics.
It introduces VFCNet, which fuses saliency and edge cues into a gradient vector field via gradient vector flow (GVF), using a dual-stream GVF design integrated with attention.
VFCNet extracts multi-scale flow features using a DINOv3 backbone and attains state-of-the-art results on the PICD benchmark, improving CDA-1 and CDA-2 by 33.1% and 36.1% over the prior best method.
The authors show that even a simple classifier built on self-supervised DINOv3 features can outperform more composition-specialized models, highlighting the strength of general-purpose representations.
The work is released with accompanying code on GitHub, supporting reproducibility and further experimentation.

Abstract

The reliable computational assessment of photographic composition requires features that are discriminative of spatial layout yet robust to semantic content. This paper proposes a low-level representation grounded in the assumption that composition can be understood as the flow of visual attention across geometric structure. We introduce VFCNet, which fuses saliency and edge information into a gradient vector flow (GVF) field. The model computes dual-stream GVF representations, integrates them via attention, and extracts multi-scale flow features with a DINOv3 backbone. VFCNet achieves state-of-the-art performance on the PICD benchmark (CDA-1: 0.683, CDA-2: 0.629), improving by 33.1\% and 36.1\% over the previous best method. We also show that a simple classifier on self-supervised DINOv3 features substantially outperforms more sophisticated, composition-specialized models. Code is available at https://github.com/ADadras/VFCNet

A practical guide to getting comfortable with AI coding tools

Dev.to

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

🚀 Major BrowserAct CLI Update

Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Dev.to

Semantically Stable Image Composition Analysisvia Saliency and Gradient Vector Flow Fusion

Key Points

Abstract

Related Articles

A practical guide to getting comfortable with AI coding tools

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

🚀 Major BrowserAct CLI Update

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer