Mathematics Teachers Interactions with a Multi-Agent System for Personalized Problem Generation

arXiv cs.AI / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies a teacher-in-the-loop multi-agent system that generates personalized middle school math problems from a teacher-supplied base problem and target topic using LLMs.
Four specialized AI agents evaluate generated problems on mathematical accuracy, authenticity, readability, and realism to support quality control during problem writing.
In a classroom deployment via ASSISTments, eight middle school math teachers generated 212 problems and assigned them to students, showing practical classroom use of the system.
Teachers and students expressed a desire to adjust fine-grained real-world context elements, indicating potential authenticity/fit concerns even when realism issues were detected earlier by agents.
Final problem versions showed few reported issues with realism, readability, or mathematical hallucinations, suggesting the multi-agent evaluation can reduce some quality risks while preserving teacher control.

Abstract

Large language models can increasingly adapt educational tasks to learners characteristics. In the present study, we examine a multi-agent teacher-in-the-loop system for personalizing middle school math problems. The teacher enters a base problem and desired topic, the LLM generates the problem, and then four AI agents evaluate the problem using criteria that each specializes in (mathematical accuracy, authenticity, readability, and realism). Eight middle school mathematics teachers created 212 problems in ASSISTments using the system and assigned these problems to their students. We find that both teachers and students wanted to modify the fine-grained personalized elements of the real-world context of the problems, signaling issues with authenticity and fit. Although the agents detected many issues with realism as the problems were being written, there were few realism issues noted by teachers and students in the final versions. Issues with readability and mathematical hallucinations were also somewhat rare. Implications for multi-agent systems for personalization that support teacher control are given.