Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

arXiv cs.CV / 3/18/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Evo-Retriever introduces an LLM-guided curriculum evolution framework with Viewpoint-Pathway collaboration to adapt multimodal document retrieval as the model evolves.
The method combines multi-view image alignment for fine-grained cross-modal matching with a bidirectional contrastive learning strategy that generates hard queries and establishes complementary learning paths for visual and textual disambiguation.
A model-state summary is fed into an LLM meta-controller that adaptively adjusts the training curriculum using expert knowledge to guide the model's continual evolution.
On ViDoRe V2 and MMEB datasets, Evo-Retriever achieves state-of-the-art performance (nDCG@5: 65.2% and 77.1%), demonstrating robust gains over prior methods.

Abstract

Visual-language models (VLMs) excel at data mappings, but real-world document heterogeneity and unstructuredness disrupt the consistency of cross-modal embeddings. Recent late-interaction methods enhance image-text alignment through multi-vector representations, yet traditional training with limited samples and static strategies cannot adapt to the model's dynamic evolution, causing cross-modal retrieval confusion. To overcome this, we introduce Evo-Retriever, a retrieval framework featuring an LLM-guided curriculum evolution built upon a novel Viewpoint-Pathway collaboration. First, we employ multi-view image alignment to enhance fine-grained matching via multi-scale and multi-directional perspectives. Then, a bidirectional contrastive learning strategy generates "hard queries" and establishes complementary learning paths for visual and textual disambiguation to rebalance supervision. Finally, the model-state summary from the above collaboration is fed into an LLM meta-controller, which adaptively adjusts the training curriculum using expert knowledge to promote the model's evolution. On ViDoRe V2 and MMEB (VisDoc), Evo-Retriever achieves state-of-the-art performance, with nDCG@5 scores of 65.2% and 77.1%.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

Waymo hits 170 million miles while avoiding serious mayhem

The Verge

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

Waymo hits 170 million miles while avoiding serious mayhem

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer