Few Shots Text to Image Retrieval: New Benchmarking Dataset and Optimization Methods
arXiv cs.CV / 3/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a new Few-Shot Text-to-Image Retrieval (FSIR) benchmark task to address weaknesses of pre-trained vision-language models on compositional and out-of-distribution (OOD) image-text query pairs.
- It releases FSIR-BD, the first dataset explicitly tailored for image retrieval with text plus reference example images, covering two compositional subsets (urban scenes and nature species) and emphasizing hard negatives.
- FSIR-BD includes 38,353 images and 303 queries, with most queries evaluated against a large test corpus (including many positives and hard negatives) and the rest used to form a few-shot reference set (FSR) of exemplar positives and hard negatives.
- The authors propose two new retrieval optimization methods that use single-shot or few-shot reference examples from FSR and are compatible with any pre-trained image encoder.
- Experiments show FSIR-BD is a challenging benchmark and that the proposed optimization methods improve retrieval quality over existing baselines, measured by mean Average Precision (mAP).
Related Articles

Black Hat Asia
AI Business
Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to
I missed the "fun" part in software development
Dev.to
The Billion Dollar Tax on AI Agents
Dev.to