Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

arXiv cs.CV / 3/18/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Evo-Retriever introduces an LLM-guided curriculum evolution framework with Viewpoint-Pathway collaboration to adapt multimodal document retrieval as the model evolves.
The method combines multi-view image alignment for fine-grained cross-modal matching with a bidirectional contrastive learning strategy that generates hard queries and establishes complementary learning paths for visual and textual disambiguation.
A model-state summary is fed into an LLM meta-controller that adaptively adjusts the training curriculum using expert knowledge to guide the model's continual evolution.
On ViDoRe V2 and MMEB datasets, Evo-Retriever achieves state-of-the-art performance (nDCG@5: 65.2% and 77.1%), demonstrating robust gains over prior methods.

Abstract

Visual-language models (VLMs) excel at data mappings, but real-world document heterogeneity and unstructuredness disrupt the consistency of cross-modal embeddings. Recent late-interaction methods enhance image-text alignment through multi-vector representations, yet traditional training with limited samples and static strategies cannot adapt to the model's dynamic evolution, causing cross-modal retrieval confusion. To overcome this, we introduce Evo-Retriever, a retrieval framework featuring an LLM-guided curriculum evolution built upon a novel Viewpoint-Pathway collaboration. First, we employ multi-view image alignment to enhance fine-grained matching via multi-scale and multi-directional perspectives. Then, a bidirectional contrastive learning strategy generates "hard queries" and establishes complementary learning paths for visual and textual disambiguation to rebalance supervision. Finally, the model-state summary from the above collaboration is fed into an LLM meta-controller, which adaptively adjusts the training curriculum using expert knowledge to promote the model's evolution. On ViDoRe V2 and MMEB (VisDoc), Evo-Retriever achieves state-of-the-art performance, with nDCG@5 scores of 65.2% and 77.1%.

Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document

Dev.to

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production

Dev.to

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

Dev.to

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

Dev.to

Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

Key Points

Abstract

Related Articles

Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production

Two bots, one confused server: what Nimbus revealed about AI agent identity

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer