SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs
arXiv cs.CV / 4/16/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes SLQ, a framework for adapting frozen multimodal large language models (MLLMs) into retrievers without changing or fine-tuning the backbone parameters.
- SLQ appends a small set of Shared Latent Queries to both text and image token sequences so the model’s causal attention can act as a global aggregation interface to produce compact embeddings in a unified space.
- The authors argue that retrieval adaptation should elicit existing pre-trained representations rather than overwriting them, to avoid disrupting semantic space and structured knowledge needed for reasoning.
- They introduce KARR-Bench, a benchmark aimed at knowledge-aware reasoning retrieval to better evaluate performance beyond shallow pattern matching.
- Experiments report SLQ outperforming full fine-tuning and LoRA on COCO and Flickr30K, while also performing competitively on MMEB and delivering substantial gains on KARR-Bench.
Related Articles

Black Hat Asia
AI Business

Introducing Claude Opus 4.7
Anthropic News

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too
TechCrunch

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to