Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

arXiv cs.AI / 5/1/2026

📰 NewsModels & Research

Key Points

  • The paper introduces MEDS (Math Education Digital Shadows), a new dataset designed to measure how LLMs reason about and report math under both human-like and AI-assistant-like conditions.
  • MEDS is built from 28,000 “personas” across 14 LLMs (including Mistral, Qwen, DeepSeek, Granite, Phi, and Grok), with each shadow containing math prompts plus psychological and sociodemographic persona metadata.
  • The dataset goes beyond traditional math benchmarks by including task types and measures tied to self-efficacy, math anxiety, cognitive networks/attitudes, and confidence—not just math accuracy.
  • Validation results indicate schema integrity and consistent persona behavior, while also revealing family-specific patterns such as human-like negative math attitudes, logical fallacies, and overconfidence.
  • MEDS is intended to support learning analytics, cognitive science research, and the development of safer math tutoring AI systems.

Abstract

To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mapping how large language models reason about and report mathematics across human- and AI-like conditions. MEDS involves 28,000 personas from 14 LLMs (from families like Mistral, Qwen, DeepSeek, Granite, Phi and Grok) shadowing either humans or AI assistants. Each record/shadow includes a set of prompts along with psychological/sociodemographic persona metadata and four types of math tasks: (i) open math interview, (ii) three psychometric tests about math perceptions with explanations, (iii) cognitive networks capturing math attitudes, and (iv) 18 high-school math test questions together with their reasoning and confidence scores. MEDS differs from traditional score-only math benchmarks because it integrates concepts of self-efficacy, math anxiety, and cognitive network science besides math proficiency scores. Data validation shows that the sampled LLMs exhibit schema integrity and consistent personas, together with family-specific peculiarities like human-like negative math attitudes, logical fallacies, and math overconfidence. MEDS will benefit learning analytics experts, cognitive scientists, and developers of safer AI tutors in mathematics.