Mind DeepResearch Technical Report

arXiv cs.AI / 4/17/2026

📰 NewsIndustry & Market MovesModels & Research

Key Points

  • Mind DeepResearch (MindDR) is introduced as an efficient multi-agent deep research framework that delivers leading results using only ~30B-parameter models via a tailored data synthesis and multi-stage training pipeline.
  • The system’s key design is a collaborative three-agent setup (Planning Agent, DeepSearch Agent, and Report Agent) combined with four specialized training stages: SFT cold-start, Search-RL, Report-RL, and preference alignment.
  • Reported evaluations show MindDR outperforming comparable-scale open-source agent systems and approaching the performance of larger-scale models across multiple benchmarks, including BrowseComp-ZH, BrowseComp, WideSearch, xbench-DS, and DeepResearch Bench.
  • The paper also states MindDR has been deployed as an online product for Li Auto, and introduces MindDR Bench with 500 real-world Chinese queries assessed using a multi-dimensional rubric rather than a single metric, where MindDR reaches an SOTA score of 51.8.

Abstract

We present \textbf{Mind DeepResearch (MindDR)}, an efficient multi-agent deep research framework that achieves leading performance with only \textasciitilde30B-parameter models through a meticulously designed data synthesis and multi-stage training pipeline. The core innovation of MindDR lies in a collaborative three-agent architecture (Planning Agent, DeepSearch Agent, and Report Agent) and a four-stage agent-specialized training pipeline comprising SFT cold-start, Search-RL, Report-RL and preference alignment. With this regime, MindDR demonstrates competitive performance even with \textasciitilde30B-scale models. Specifically, MindDR achieves 45.7\% on BrowseComp-ZH, 42.8\% on BrowseComp, 46.5\% on WideSearch, 75.0\% on xbench-DS, and 52.5 on DeepResearch Bench, outperforming comparable-scale open-source agent systems and rivaling larger-scale models. MindDR has been deployed as an online product in Li Auto. Furthermore, we introduce \textbf{MindDR Bench}, a curated benchmark of 500 real-world Chinese queries from our internal product user interactions, evaluated through a comprehensive multi-dimensional rubric system rather than relying on a single RACE metric. On MindDR Bench, MindDR achieves a state-of-the-art score of 51.8.