Tarot-SAM3: Training-free SAM3 for Any Referring Expression Segmentation
arXiv cs.CV / 4/10/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses Referring Expression Segmentation (RES), which segments image regions described by natural-language queries, and highlights limitations of prior approaches that depend on large labeled datasets and struggle with implicit or long expressions.
- Building on SAM3’s robustness in promptable concept segmentation, the authors propose Tarot-SAM3 to enable accurate segmentation from any referring expression in a training-free manner.
- Tarot-SAM3 uses an Expression Reasoning Interpreter (ERI) to produce reasoning-assisted, rephrased, heterogeneous prompts that improve structured parsing of diverse queries for SAM3.
- It further applies Mask Self-Refining (MSR) to select the best mask type and refine segmentation by using DINOv3-derived feature relationships to correct over- and under-segmentation.
- Experiments and ablations report strong results across explicit, implicit, and open-world RES benchmarks, with each phase validated as contributing to overall performance.
Related Articles

Black Hat Asia
AI Business
v0.20.5
Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS
Dev.to
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.
Reddit r/LocalLLaMA
SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System
Dev.to