Make It Up: Fake Images, Real Gains in Generalized Few-shot Semantic Segmentation

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation in generalized few-shot semantic segmentation: scarce annotations lead to poor coverage of novel-class appearances and noisy supervision when masks are unreliable or missing.
It introduces Syn4Seg, which uses diffusion-generated synthetic images to expand novel-class coverage while improving pseudo-label quality through support-guided refinement.
Syn4Seg builds an embedding-deduplicated prompt bank per novel class to create diverse yet class-consistent synthetic images, aiming to better cover the prompt/appearance space.
It estimates pseudo-labels with a two-stage process that first extracts high-precision seed regions using consistency filtering, then relabels uncertain pixels via image-adaptive prototypes combining global support and local image statistics.
The method refines only boundary-band and unlabeled pixels using a constrained SAM-based update to improve contour fidelity without overwriting high-confidence interior regions, with experiments showing consistent gains on PASCAL-5i and COCO-20i in both 1-shot and 5-shot settings.

Abstract

Generalized few-shot semantic segmentation (GFSS) is fundamentally limited by the coverage of novel-class appearances under scarce annotations. While diffusion models can synthesize novel-class images at scale, practical gains are often hindered by insufficient coverage and noisy supervision when masks are unavailable or unreliable. We propose Syn4Seg, a generation-enhanced GFSS framework designed to expand novel-class coverage while improving pseudo-label quality. Syn4Seg first maximizes prompt-space coverage by constructing an embedding-deduplicated prompt bank for each novel class, yielding diverse yet class-consistent synthetic images. It then performs support-guided pseudo-label estimation via a two-stage refinement that i) filters low-consistency regions to obtain high-precision seeds and ii) relabels uncertain pixels with image-adaptive prototypes that combine global (support) and local (image) statistics. Finally, we refine only boundary-band and unlabeled pixels using a constrained SAM-based update to improve contour fidelity without overwriting high-confidence interiors. Extensive experiments on PASCAL-

5^i

and COCO-

20^i

demonstrate consistent improvements in both 1-shot and 5-shot settings, highlighting synthetic data as a scalable path for GFSS with reliable masks and precise boundaries.