DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling
arXiv cs.CL / 4/9/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- DataSTORM is an LLM-based agentic system designed for deep research across large-scale structured databases in addition to internet sources, addressing gaps left by web-focused methods.
- The approach frames deep structured-data research as a thesis-driven process using Exploratory Data Analysis and Data Storytelling: generating candidate theses, validating them via iterative cross-source investigation, and converging on a coherent narrative.
- Evaluations on InsightBench show DataSTORM achieving new state-of-the-art performance, improving insight-level recall by 19.4% and summary-level scores by 7.2% relative to prior methods.
- The paper also introduces an ACLED-derived dataset and reports that DataSTORM outperforms proprietary systems such as ChatGPT Deep Research on both automated metrics and human evaluations.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to