TF-SSD: A Strong Pipeline via Synergic Mask Filter for Training-free Co-salient Object Detection

arXiv cs.CV / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces TF-SSD, a training-free co-salient object detection pipeline designed to better generalize beyond closed-set training constraints typical of prior methods.
TF-SSD synergizes SAM and DINO by using SAM to generate raw mask proposals, then filtering redundant masks with a quality mask generator.
Because the SAM-based filter lacks saliency semantics, TF-SSD adds an intra-image saliency filter that leverages DINO attention maps to select visually salient masks per image.
To ensure consistency across a group of related images, it further proposes an inter-image prototype selector that compares cross-image prototype similarities and keeps the highest-scoring masks as final predictions.
Experiments report that TF-SSD outperforms existing approaches, including a stated 13.7% improvement over the most recent training-free baseline, with code released on GitHub.

Abstract

Co-salient Object Detection (CoSOD) aims to segment salient objects that consistently appear across a group of related images. Despite the notable progress achieved by recent training-based approaches, they still remain constrained by the closed-set datasets and exhibit limited generalization. However, few studies explore the potential of Vision Foundation Models (VFMs) to address CoSOD, which demonstrate a strong generalized ability and robust saliency understanding. In this paper, we investigate and leverage VFMs for CoSOD, and further propose a novel training-free method, TF-SSD, through the synergy between SAM and DINO. Specifically, we first utilize SAM to generate comprehensive raw proposals, which serve as a candidate mask pool. Then, we introduce a quality mask generator to filter out redundant masks, thereby acquiring a refined mask set. Since this generator is built upon SAM, it inherently lacks semantic understanding of saliency. To this end, we adopt an intra-image saliency filter that employs DINO's attention maps to identify visually salient masks within individual images. Moreover, to extend saliency understanding across group images, we propose an inter-image prototype selector, which computes similarity scores among cross-image prototypes to select masks with the highest score. These selected masks serve as final predictions for CoSOD. Extensive experiments show that our TF-SSD outperforms existing methods (e.g., 13.7\% gains over the recent training-free method). Codes are available at https://github.com/hzz-yy/TF-SSD.

Black Hat Asia

AI Business

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama

Dev.to

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally

Dev.to

Why the same codebase should always produce the same audit score

Dev.to

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)

Dev.to

TF-SSD: A Strong Pipeline via Synergic Mask Filter for Training-free Co-salient Object Detection

Key Points

Abstract

Related Articles

Black Hat Asia

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally

Why the same codebase should always produce the same audit score

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer