MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

arXiv cs.CV / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses two common issues in Composed Image Retrieval (CIR): frequency bias that causes rare modification semantics to be neglected, and instability of similarity scores due to hard negative samples and noise.
  • It introduces MELT, a Modification frEquentation-rarity baLance neTwork that increases attention to rare modification semantics in multimodal (reference image + text) settings.
  • To improve robustness against hard negatives, MELT uses diffusion-based denoising to reduce the influence of hard negative samples with high similarity scores.
  • Experiments on two CIR benchmarks reportedly show that MELT achieves superior performance compared with existing CIR approaches.
  • The authors provide implementation code at the linked GitHub repository, enabling reproducibility and further experimentation.

Abstract

Composed Image Retrieval (CIR) uses a reference image and a modification text as a query to retrieve a target image satisfying the requirement of ``modifying the reference image according to the text instructions''. However, existing CIR methods face two limitations: (1) frequency bias leading to ``Rare Sample Neglect'', and (2) susceptibility of similarity scores to interference from hard negative samples and noise. To address these limitations, we confront two key challenges: asymmetric rare semantic localization and robust similarity estimation under hard negative samples. To solve these challenges, we propose the Modification frEquentation-rarity baLance neTwork MELT. MELT assigns increased attention to rare modification semantics in multimodal contexts while applying diffusion-based denoising to hard negative samples with high similarity scores, enhancing multimodal fusion and matching. Extensive experiments on two CIR benchmarks validate the superior performance of MELT. Codes are available at https://github.com/luckylittlezhi/MELT.