AI Navigate

EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

arXiv cs.AI / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • EDM-ARS presents a domain-specific multi-agent pipeline that automates end-to-end educational data mining research by embedding educational expertise at every stage of the workflow.
  • It coordinates five LLM-powered agents—ProblemFormulator, DataEngineer, Analyst, Critic, and Writer—via a state-machine controller that supports revision loops, checkpoint-based recovery, and sandboxed code execution.
  • Given a research prompt and a dataset, EDM-ARS can generate a complete LaTeX manuscript with real Semantic Scholar citations, validated machine learning analyses, and automated methodological peer review.
  • The report details the system architecture, a three-tier data registry design, agent specifications, the inter-agent communication protocol, and error-handling and self-correction mechanisms, while also acknowledging limitations and a roadmap toward future capabilities.

Abstract

In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research. We conceptualize EDM-ARS as a general framework for domain-aware automated research pipelines, where educational expertise is embedded into each stage of the research lifecycle. As a first instantiation of this framework, we focus on predictive modeling tasks. Within this scope, EDM-ARS orchestrates five specialized LLM-powered agents (ProblemFormulator, DataEngineer, Analyst, Critic, and Writer) through a state-machine coordinator that supports revision loops, checkpoint-based recovery, and sandboxed code execution. Given a research prompt and a dataset, EDM-ARS produces a complete LaTeX manuscript with real Semantic Scholar citations, validated machine learning analyses, and automated methodological peer review. We also provide a detailed description of the system architecture, the three-tier data registry design that encodes educational domain expertise, the specification of each agent, the inter-agent communication protocol, and mechanisms for error-handling and self-correction. Finally, we discuss current limitations, including single-dataset scope and formulaic paper output, and outline a phased roadmap toward causal inference, transfer learning, psychometric, and multi-dataset generalization. EDM-ARS is released as an open-source project to support the educational research community.