DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

arXiv cs.LG / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces DR-Venus, a “frontier” 4B-parameter deep research agent designed for edge-scale deployment using only open data to meet cost, latency, and privacy needs.
It proposes a two-stage training approach: agentic supervised fine-tuning with strict data cleaning and resampling of long-horizon trajectories to improve data quality and utilization.
It then applies agentic reinforcement learning to strengthen long-horizon execution reliability, using IGPO plus turn-level rewards based on information gain and format-aware regularization to improve supervision density and credit assignment.
Despite relying on roughly 10K open-data examples, DR-Venus-4B outperforms prior agentic models up to 9B parameters and narrows the gap versus much larger 30B-class systems across multiple deep research benchmarks.
The authors release models, code, and key training recipes to enable reproducible research on edge-scale deep research agents.

Abstract

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

Elevating Austria: Google invests in its first data center in the Alps.

Google Blog

10 AI Tools Every Developer Should Try in 2026

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Elevating Austria: Google invests in its first data center in the Alps.

10 AI Tools Every Developer Should Try in 2026

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer