Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

arXiv cs.CV / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that current synthetic-data learning approaches often improve robustness only indirectly and may fail to explicitly steer models toward the input-space regions that matter for discrimination.
It proposes using provenance information from the synthetic data generation process to identify target-versus-non-target regions, then applies provenance-based input gradient guidance to suppress gradients from non-target regions.
By decomposing input gradients according to target and non-target origin during synthesis, the method aims to prevent learning spurious correlations driven by synthesis biases and artifacts.
Experiments across multiple tasks and modalities—including weakly supervised object localization, spatio-temporal action localization, and image classification—show the approach is effective and general.

Abstract

Learning methods using synthetic data have attracted attention as an effective approach for increasing the diversity of training data while reducing collection costs, thereby improving the robustness of model discrimination. However, many existing methods improve robustness only indirectly through the diversification of training samples and do not explicitly teach the model which regions in the input space truly contribute to discrimination; consequently, the model may learn spurious correlations caused by synthesis biases and artifacts. Motivated by this limitation, this paper proposes a learning framework that uses provenance information obtained during the training data synthesis process, indicating whether each region in the input space originates from the target object, as an auxiliary supervisory signal to promote the acquisition of representations focused on target regions. Specifically, input gradients are decomposed based on information about target and non-target regions during synthesis, and input gradient guidance is introduced to suppress gradients over non-target regions. This suppresses the model's reliance on non-target regions and directly promotes the learning of discriminative representations for target regions. Experiments demonstrate the effectiveness and generality of the proposed method across multiple tasks and modalities, including weakly supervised object localization, spatio-temporal action localization, and image classification.