TALENT: Target-aware Efficient Tuning for Referring Image Segmentation

arXiv cs.CV / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • 画像に対して自然言語で指定した対象(referring expression)だけをセグメントする referring image segmentation (RIS) における、PET(parameter-efficient tuning)で起きる「非ターゲット活性化(NTA)」問題を分析・定量化しています。
  • 提案手法 TALENT は、テキストで参照された特徴を効率的に集約する Rectified Cost Aggregator (RCA) と、NTA を抑えてターゲットの活性を正しく校正する Target-aware Learning Mechanism (TLM) を組み合わせます。
  • TLM では、文レベルのテキスト特徴を用いて文脈的な一貫性を学習する contextual pairwise consistency learning と、ターゲット中心の対照学習で別インスタンスへの紐付けを抑制する target-centric contrastive learning を同時に行います。
  • 実験では TALENT が既存手法を複数の指標で上回り、例として G-Ref val で 2.5% の mIoU 向上が報告されています。
  • コードは GitHub で公開予定で、RIS の PET 適用における性能改善のための実装面でも利用可能になる見込みです。

Abstract

Referring image segmentation aims to segment specific targets based on a natural text expression. Recently, parameter-efficient tuning (PET) has emerged as a promising paradigm. However, existing PET-based methods often suffer from the fact that visual features can't emphasize the text-referred target instance but activate co-category yet unrelated objects. We analyze and quantify this problem, terming it the `non-target activation' (NTA) issue. To address this, we propose a novel framework, TALENT, which utilizes target-aware efficient tuning for PET-based RIS. Specifically, we first propose a Rectified Cost Aggregator (RCA) to efficiently aggregate text-referred features. Then, to calibrate `NTA' into accurate target activation, we adopt a Target-aware Learning Mechanism (TLM), including contextual pairwise consistency learning and target-centric contrastive learning. The former uses the sentence-level text feature to achieve a holistic understanding of the referent and constructs a text-referred affinity map to optimize the semantic association of visual features. The latter further enhances target localization to discover the distinct instance while suppressing associations with other unrelated ones. The two objectives work in concert and address `NTA' effectively. Extensive evaluations show that TALENT outperforms existing methods across various metrics (e.g., 2.5\% mIoU gains on G-Ref val set). Our codes will be released at: https://github.com/Kimsure/TALENT.