AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
arXiv cs.LG / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- AgentDS introduces a benchmark and competition to evaluate AI agents and human-AI collaboration in domain-specific data science across six industries.
- The open competition involved 29 teams and 80 participants, enabling systematic comparisons between AI-only baselines and human-AI collaborative approaches.
- Findings indicate that current AI agents struggle with domain-specific reasoning, with AI-only baselines performing near or below the median of human participants.
- Remarkably, the strongest results come from human-AI collaboration, underscoring that fully autonomous AI is not yet sufficient for domain-specific data science.
- The project provides open-source datasets on HuggingFace and an official website (agentds.org) for ongoing benchmarking.
Related Articles

I let an AI agent loose on my codebase. It tried to read my .env file in 30 seconds.
Dev.to
Alex Chenglin Wu of DeepWisdom On The Future Of Artificial Intelligence | by Chad Silverstein | Authority Magazine | Mar, 2026
Reddit r/artificial
The Exit
Dev.to

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI
Dev.to
The Crucible
Dev.to