Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing

arXiv cs.CV / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that low-level image processing should be evaluated not only by visual fidelity but also by whether semantic content is preserved, since deep learning and generative pipelines can change meaning while keeping perceptual quality.
  • It formalizes a new evaluation task called Semantic Similarity, defining semantic entities and their relationships to support semantic-level assessment of processed images.
  • The authors propose a Triplet-based Semantic Similarity Score (T3S) that models semantics using foreground entities, background entities, and the relations between them, supported by semantic entity extraction and foreground-background disentanglement.
  • Experiments on COCO and SPA-Data show that T3S outperforms fidelity-focused IQA metrics and several semantic baselines, and better tracks semantic changes across different degradations.
  • Overall, the work emphasizes that semantic assessment is increasingly important for modern low-level vision systems where downstream meaning matters as much as appearance.

Abstract

Low-level image processing has long been evaluated mainly from the perspective of visual fidelity. However, with the rise of deep learning and generative models, processed images may preserve perceptual quality while altering semantic content, making conventional Image Quality Assessment (IQA) insufficient for semantic-level assessment. In this paper, we formalize \textit{Semantic Similarity} as a new evaluation task for low-level image processing, aimed at measuring whether semantic content is preserved after processing. We further present a structured formulation of image semantics based on semantic entities and their relations, and discuss the desired properties and constraints of a valid semantic similarity index. Based on this formulation, we propose Triplet-based Semantic Similarity Score (T3S), which models image semantics through foreground entities, background entities, and relations. T3S combines semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling. Experiments on COCO and SPA-Data show that T3S consistently outperforms existing fidelity-oriented metrics and representative semantic-level baselines, while better reflecting progressive semantic changes under diverse degradations. These results highlight the importance of semantic assessment in modern low-level vision.