AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
arXiv cs.LG / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- AgentDS introduces a benchmark and competition to evaluate AI agents and human-AI collaboration in domain-specific data science across six industries.
- The open competition involved 29 teams and 80 participants, enabling systematic comparisons between AI-only baselines and human-AI collaborative approaches.
- Findings indicate that current AI agents struggle with domain-specific reasoning, with AI-only baselines performing near or below the median of human participants.
- Remarkably, the strongest results come from human-AI collaboration, underscoring that fully autonomous AI is not yet sufficient for domain-specific data science.
- The project provides open-source datasets on HuggingFace and an official website (agentds.org) for ongoing benchmarking.
Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer
The Batch

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**
Dev.to

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)
Dev.to