Gram-MMD: A Texture-Aware Metric for Image Realism Assessment

Key Points

The paper introduces Gram-MMD (GMMD), a new texture-aware metric for assessing the realism of generated images by comparing Gram-matrix feature correlations between real and generated distributions.

Abstract

Evaluating the realism of generated images remains a fundamental challenge in generative modeling. Existing distributional metrics such as the Frechet Inception Distance (FID) and CLIP-MMD (CMMD) compare feature distributions at a semantic level but may overlook fine-grained textural information that can be relevant for distinguishing real from generated images. We introduce Gram-MMD (GMMD), a realism metric that leverages Gram matrices computed from intermediate activations of pretrained backbone networks to capture correlations between feature maps. By extracting the upper-triangular part of these symmetric Gram matrices and measuring the Maximum Mean Discrepancy (MMD) between an anchor distribution of real images and an evaluation distribution, GMMD produces a representation that encodes textural and structural characteristics at a finer granularity than global embeddings. To select the hyperparameters of the metric, we employ a meta-metric protocol based on controlled degradations applied to MS-COCO images, measuring monotonicity via Spearman's rank correlation and Kendall's tau. We conduct experiments on both the KADID-10k database and the RAISE realness assessment dataset using various backbone architectures, including DINOv2, DC-AE, Stable Diffusion's VAE encoder, VGG19, and the AlexNet backbone from LPIPS, among others. We also demonstrate on a cross-domain driving scenario (KITTI / Virtual KITTI / Stanford Cars) that CMMD can incorrectly rank real images as less realistic than synthetic ones due to its semantic bias, while GMMD preserves the correct ordering. Our results suggest that GMMD captures complementary information to existing semantic-level metrics.

Gram-MMD: A Texture-Aware Metric for Image Realism Assessment

Key Points

Abstract

Related Articles

Black Hat Asia

AI Doesn't Need a Bigger Engine. It Needs a Seatbelt.

Moving fast with agents without losing comprehension

From Forecast to Fulfillment: Aligning AI Predictions with Farm Sales

HarshAI: I Built a Zapier Killer in 40 Days (Open Source)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

AI Doesn't Need a Bigger Engine. It Needs a Seatbelt.
Dev.to

Moving fast with agents without losing comprehension
Dev.to

From Forecast to Fulfillment: Aligning AI Predictions with Farm Sales
Dev.to

HarshAI: I Built a Zapier Killer in 40 Days (Open Source)
Dev.to