Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis

arXiv cs.LG / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that existing SVG generation evaluation largely reduces outputs to image- or text-level similarity, missing whether generated SVGs preserve the structural properties needed for editing.
  • It introduces an element-level leave-one-out (LOO) evaluation method that renders SVGs with and without each element to quantify that element’s contribution to visual quality.
  • From the same LOO mechanism, the authors derive per-element quality scores for zero-shot artifact detection and concept-to-element attribution for linking code parts to visual concepts.
  • They also propose four SVG modularity metrics—purity, coverage, compactness, and locality—to evaluate structural organization from multiple complementary angles.
  • The method is validated on 19,000+ edits across multiple generation systems, edit types, and complexity tiers, supporting the practicality of structural evaluation beyond simple similarity scores.

Abstract

Scalable Vector Graphics (SVG) represent visual content as structured, editable code. Each element (path, shape, or text node) can be individually inspected, transformed, or removed. This structural editability is a main motivation for SVG generation, yet prevailing evaluation protocols primarily reduce the output to a single similarity score against a reference image or input texts, measuring how faithfully the result reproduces an image or follows the instructions, but not how well it preserves the structural properties that make SVG valuable. In particular, existing metrics cannot determine which generated elements contribute positively to overall visual quality, how visual concepts map to specific parts of the code, or whether the generated output supports meaningful downstream editing. We introduce element-level leave-one-out (LOO) analysis, inspired by the classic jackknife estimator. The procedure renders the SVG with and without each element, measures the resulting visual change, and derives a suite of structural quality metrics. Despite its simplicity, the jackknife's capacity to decompose an aggregate statistic into per-sample contributions translates directly to this setting. From a single mechanism, we obtain: (1) quality scores per element through LOO scoring that enable zero-shot artifact detection; (2) concept-element attribution that maps each element to the visual concept it serves; and (3) four structural metrics, purity, coverage, compactness, and locality, that quantify SVG modularity from complementary perspectives. We validate these metrics on over 19,000 edits (5 types) across 5 generation systems and 3 complexity tiers.