Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

arXiv cs.CV / 4/22/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that Composed Image Retrieval (CIR) is strongly limited by the Noisy Triplet Correspondence (NTC) problem, where semantic ambiguity breaks the common “small loss hypothesis” used by robust learning methods.
  • It proposes Air-Know (ArbIteR calibrated Knowledge iNternalizing rObust netWork), an approach that avoids an adversarial feedback loop between the learner and an arbiter that can cause catastrophic representation pollution.
  • Air-Know uses an “Expert-Proxy-Diversion” decoupling framework with three modules: External Prior Arbitration (EPA) using multimodal LLMs to build a high-precision anchor dataset, Expert Knowledge Internalization (EKI) to train a lightweight proxy arbiter, and Dual Stream Reconciliation (DSR) to divert data based on matching confidence.
  • Experiments on multiple CIR benchmarks show Air-Know substantially improves state-of-the-art performance specifically under the NTC noise setting, while remaining competitive in standard (non-NTC) CIR.
  • The work highlights a practical strategy for robust retrieval training: calibrating learning using offline expert knowledge and separating arbitration from representation learning via confidence-based data routing.

Abstract

Composed Image Retrieval (CIR) has attracted significant attention due to its flexible multimodal query method, yet its development is severely constrained by the Noisy Triplet Correspondence (NTC) problem. Most existing robust learning methods rely on the "small loss hypothesis", but the unique semantic ambiguity in NTC, such as "partial matching", invalidates this assumption, leading to unreliable noise identification. This entraps the model in a self dependent vicious cycle where the learner is intertwined with the arbiter, ultimately causing catastrophic "representation pollution". To address this critical challenge, we propose a novel "Expert-Proxy-Diversion" decoupling paradigm, named Air-Know (ArbIteR calibrated Knowledge iNternalizing rObust netWork). Air-Know incorporates three core modules: (1) External Prior Arbitration (EPA), which utilizes Multimodal Large Language Models (MLLMs) as an offline expert to construct a high precision anchor dataset; (2) Expert Knowledge Internalization (EKI), which efficiently guides a lightweight proxy "arbiter" to internalize the expert's discriminative logic; (3) Dual Stream Reconciliation (DSR), which leverages the EKI's matching confidence to divert the training data, achieving a clean alignment stream and a representation feedback reconciliation stream. Extensive experiments on multiple CIR benchmark datasets demonstrate that Air-Know significantly outperforms existing SOTA methods under the NTC setting, while also showing strong competitiveness in traditional CIR.