IP-SAM: Prompt-Space Conditioning for Prompt-Absent Camouflaged Object Detection

arXiv cs.CV / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

IP-SAM addresses a deployment mismatch in prompt-conditioned segmentation by conditioning the model in prompt space so it can segment when no external prompts are available at inference.
The method uses a Self-Prompt Generator (SPG) to derive intrinsic, coarse regional anchor prompts from image context and feeds them through SAM2’s frozen prompt encoder to preserve the native prompt interface.
Prompt-Space Gating (PSG) suppresses background-driven false positives by applying an asymmetric constraint using an intrinsic background prompt before decoding.
Experiments report state-of-the-art performance on four camouflaged object detection benchmarks with only 21.26M trainable parameters, training SPG/PSG and a task-specific decoder from scratch while keeping the prompt encoder frozen (with image-encoder LoRA).
The prompt-space conditioning strategy also transfers beyond COD, showing strong zero-shot generalization from Kvasir-SEG to CVC-ClinicDB and ETIS for medical polyp segmentation.

Abstract

Prompt-conditioned foundation segmenters have emerged as a dominant paradigm for image segmentation, where explicit spatial prompts (e.g., points, boxes, masks) guide mask decoding. However, many real-world deployments require fully automatic segmentation, creating a structural mismatch: the decoder expects prompts that are unavailable at inference. Existing adaptations typically modify intermediate features, inadvertently bypassing the model's native prompt interface and weakening prompt-conditioned decoding. We propose IP-SAM, which revisits adaptation from a prompt-space perspective through prompt-space conditioning. Specifically, a Self-Prompt Generator (SPG) distills image context into complementary intrinsic prompts that serve as coarse regional anchors. These cues are projected through SAM2's frozen prompt encoder, restoring prompt-guided decoding without external intervention. To suppress background-induced false positives, Prompt-Space Gating (PSG) leverages the intrinsic background prompt as an asymmetric suppressive constraint prior to decoding. Under a deterministic no-external-prompt protocol, IP-SAM achieves state-of-the-art performance across four camouflaged object detection benchmarks (e.g., MAE 0.017 on COD10K) with only 21.26M trainable parameters (optimizing SPG, PSG, and a task-specific mask decoder trained from scratch, alongside image-encoder LoRA while keeping the prompt encoder frozen). Furthermore, the proposed conditioning strategy generalizes beyond COD to medical polyp segmentation, where a model trained solely on Kvasir-SEG exhibits strong zero-shot transfer to both CVC-ClinicDB and ETIS.

Black Hat Asia

AI Business

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

Dev.to

BYOK is not just a pricing model: why it changes AI product trust

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

IP-SAM: Prompt-Space Conditioning for Prompt-Absent Camouflaged Object Detection

Key Points

Abstract

Related Articles

Black Hat Asia

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

BYOK is not just a pricing model: why it changes AI product trust

AI Citation Registries and Identity Persistence Across Records

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer