Few-Shot Semantic Segmentation Meets SAM3

arXiv cs.CV / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper revisits few-shot semantic segmentation (FSS) and proposes a training-free approach using the Segment Anything Model 3 (SAM3) instead of expensive episodic representation learning.
It repurposes SAM3’s Promptable Concept Segmentation by concatenating support and query images on a shared spatial canvas, enabling segmentation with a fully frozen model and no architectural changes.
Experiments on PASCAL-$5^i$ and COCO-$20^i$ report state-of-the-art results, outperforming many more engineered methods.
The study finds that negative prompts can hurt performance in few-shot settings by weakening target representations and causing prediction collapse, revealing limits in how foundation models handle conflicting prompt signals.
The authors release code to support replication and further exploration of spatial prompt formulations for cross-image reasoning in FSS.

Abstract

Few-Shot Semantic Segmentation (FSS) focuses on segmenting novel object categories from only a handful of annotated examples. Most existing approaches rely on extensive episodic training to learn transferable representations, which is both computationally demanding and sensitive to distribution shifts. In this work, we revisit FSS from the perspective of modern vision foundation models and explore the potential of Segment Anything Model 3 (SAM3) as a training-free solution. By repurposing its Promptable Concept Segmentation (PCS) capability, we adopt a simple spatial concatenation strategy that places support and query images into a shared canvas, allowing a fully frozen SAM3 to perform segmentation without any fine-tuning or architectural changes. Experiments on PASCAL-

5^i

and COCO-

20^i

show that this minimal design already achieves state-of-the-art performance, outperforming many heavily engineered methods. Beyond empirical gains, we uncover that negative prompts can be counterproductive in few-shot settings, where they often weaken target representations and lead to prediction collapse despite their intended role in suppressing distractors. These findings suggest that strong cross-image reasoning can emerge from simple spatial formulations, while also highlighting limitations in how current foundation models handle conflicting prompt signals. Code at: https://github.com/WongKinYiu/FSS-SAM3

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/8DailyView insight →

Black Hat Asia

AI Business

30 Days, $0, Full Autonomy: The Real Report on Running an AI Agent Without a Credit Card

Dev.to

We are building an OS for AI-built software. Here's what that means

Dev.to

Claude Code Forgot My Code. Here's Why.

Dev.to

Whats'App Ai Assistant

Dev.to

Few-Shot Semantic Segmentation Meets SAM3

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

30 Days, $0, Full Autonomy: The Real Report on Running an AI Agent Without a Credit Card

We are building an OS for AI-built software. Here's what that means

Claude Code Forgot My Code. Here's Why.

Whats'App Ai Assistant

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer