POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch
arXiv cs.CV / 4/16/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that large multimodal models (LMMs) are limited by static parametric knowledge and therefore need active multimodal search for evidence retrieval from the external world.
- It proposes building a multimodal agentic search model end-to-end rather than retrofitting an existing LMM with search as an add-on module.
- The authors introduce “Agentic Seeding” to create training conditions that elicit agent-like behaviors from the start.
- They identify a long-horizon interaction bottleneck where growing dialogue history makes it harder to find ground-truth evidence, and they mitigate it with “V-Fold,” an adaptive history-aware compression approach.
- They release “POINTS-Seeker-8B,” which they report as outperforming prior multimodal agentic search models across six benchmarks, specifically improving long-horizon, knowledge-intensive visual reasoning.
Related Articles

Black Hat Asia
AI Business

Introducing Claude Opus 4.7
Anthropic News
AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too
TechCrunch

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to
"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to