AIDABench: AI Data Analytics Benchmark
arXiv cs.AI / 3/18/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- AIDABench introduces a comprehensive end-to-end benchmark with 600+ document analytics tasks across three capabilities: question answering, data visualization, and file generation.
- Tasks involve realistic, heterogeneous data such as spreadsheets, databases, financial reports, and operational records, spanning diverse industries and job functions.
- Evaluations on 11 models (proprietary and open-source) show the best pass-at-1 is 59.43%, underscoring remaining gaps in real-world AI data analytics capabilities.
- The paper provides failure mode analyses, identifies key research challenges, and positions AIDABench as a reference for enterprise procurement and model optimization, with the benchmark publicly available on GitHub.
Related Articles

Astral to Join OpenAI
Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic
Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.
Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA