G-MIXER: Geodesic Mixup-based Implicit Semantic Expansion and Explicit Semantic Re-ranking for Zero-Shot Composed Image Retrieval

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces G-MIXER, a training-free method for zero-shot Composed Image Retrieval (CIR) that must balance explicit query semantics with implicit semantics from image-text composition.
Unlike prior approaches that mostly depend on MLLM-generated textual descriptions, G-MIXER uses geodesic mixup across multiple mixup ratios to expand composed query features and produce a more diverse candidate set.
G-MIXER then re-ranks the generated candidates using explicit semantics obtained from Multimodal Large Language Models (MLLMs), improving both diversity and retrieval accuracy.
The method achieves state-of-the-art results on multiple ZS-CIR benchmarks without additional training, and the authors provide code via a GitHub repository.

Abstract

Composed Image Retrieval (CIR) aims to retrieve target images by integrating a reference image with a corresponding modification text. CIR requires jointly considering the explicit semantics specified in the query and the implicit semantics embedded within its bi-modal composition. Recent training-free Zero-Shot CIR (ZS-CIR) methods leverage Multimodal Large Language Models (MLLMs) to generate detailed target descriptions, converting the implicit information into explicit textual expressions. However, these methods rely heavily on the textual modality and fail to capture the fuzzy retrieval nature that requires considering diverse combinations of candidates. This leads to reduced diversity and accuracy in retrieval results. To address this limitation, we propose a novel training-free method, Geodesic Mixup-based Implicit semantic eXpansion and Explicit semantic Re-ranking for ZS-CIR (G-MIXER). G-MIXER constructs composed query features that reflect the implicit semantics of reference image-text pairs through geodesic mixup over a range of mixup ratios, and builds a diverse candidate set. The generated candidates are then re-ranked using explicit semantics derived from MLLMs, improving both retrieval diversity and accuracy. Our proposed G-MIXER achieves state-of-the-art performance across multiple ZS-CIR benchmarks, effectively handling both implicit and explicit semantics without additional training. Our code will be available at https://github.com/maya0395/gmixer.

langchain-anthropic==1.4.1

LangChain Releases

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer

Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

Dev.to

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

Dev.to

How Data Science Is Used to Predict User BeReducing Human Error in Compliance With AI Technology havior

Dev.to

G-MIXER: Geodesic Mixup-based Implicit Semantic Expansion and Explicit Semantic Re-ranking for Zero-Shot Composed Image Retrieval

Key Points

Abstract

Related Articles

langchain-anthropic==1.4.1

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

How Data Science Is Used to Predict User BeReducing Human Error in Compliance With AI Technology havior

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer