Synthetic Designed Experiments for Diagnosing Vision Model Failure

arXiv cs.CV / 5/5/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that existing synthetic data pipelines for computer vision often use an “open-loop” approach that doesn’t explicitly diagnose which scene factors drive a model’s failure modes.
It proposes SDRS (Synthetic Designed Experiments for Representational Sufficiency), which uses Design of Experiments methods to audit a vision model’s factor-sensitivity profile by treating the model as a black box and the generator as an experimental apparatus.
SDRS decomposes sensitivity using ANOVA and classifies observed failures into two actionable gap types: Type I coverage gaps (underrepresented factor levels) and Type II spurious reliance gaps (dependence on nuisance variables).
Across three validation experiments (dSprites bias diagnosis, procedural segmentation shortcut detection, and entanglement/cross-factor contamination detection), targeted synthetic data guided by the audit substantially improves metrics.
The work also suggests representation-level correction challenges, including that per-factor invariance penalties can transfer sensitivity between factors, motivating further research.

Abstract

Current synthetic data pipelines for computer vision generate images without diagnosing what the downstream model actually needs. This open-loop paradigm treats synthetic data as cheap real data, randomly sampling the generator's output space and hoping to cover the model's failure modes. We argue this fundamentally misuses synthetic data's unique property: the controllable, independent variation of scene factors.Drawing on the statistical theory of Design of Experiments (DoE), we propose Synthetic Designed Experiments for Representational Sufficiency (SDRS). SDRS treats the downstream model as a black-box system and the synthetic generator as an experimental apparatus. Using fractional factorial designs, SDRS efficiently audits a model's factor-sensitivity profile via ANOVA decomposition. It classifies failures into two actionable types: Type I gaps (coverage failures on underrepresented factor levels) and Type II gaps (reliance on spurious nuisance dependencies). The audit then prescribes targeted synthetic data to address each gap type. We validate SDRS on three experiments: (1) a controlled diagnostic on dSprites with planted biases, where the audit correctly identifies both gap types and targeted data improves accuracy from 49.9% to 79.0%; (2) a dense segmentation task on procedural scenes, where detecting background-complexity shortcuts and applying targeted data improves mIoU from 0.948 to 0.998; and (3) an entanglement detection experiment showing that the ANOVA audit identifies cross-factor contamination in imperfect generators. Finally, we show that per-factor invariance penalties can transfer sensitivity between factors, identifying an open problem for representation-level correction.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

Google, Microsoft, and xAI will allow the US government to review their new AI models

The Verge

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors

TechCrunch

Synthetic Designed Experiments for Diagnosing Vision Model Failure

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Meta will use AI to analyze height and bone structure to identify if users are underage

Google, Microsoft, and xAI will allow the US government to review their new AI models

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer