Elementary Math Word Problem Generation using Large Language Models

arXiv cs.CL / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MathWiz, an LLM-based system for generating elementary math word problems without requiring tutors to provide partial prompts or additional equation information.
  • The system takes only three inputs—number of problems, grade level, and question type (e.g., addition or subtraction)—to produce practice-ready MWPs.
  • Extensive experiments compare different LLMs and prompting strategies, including methods to improve diversity of generated problems and approaches that incorporate human feedback.
  • Human and automated evaluations suggest the generated MWPs generally have high quality with minimal spelling and grammar errors.
  • The authors find that LLMs still have difficulty strictly meeting the specified grade and question-type constraints.

Abstract

Mathematics is often perceived as a complex subject by students, leading to high failure rates in exams. To improve Mathematics skills, it is important to provide sample questions for students to practice problem-solving. Manually creating Math Word Problems (MWPs) is time consuming for tutors, because they have to type in natural language while adhering to grammar and spelling rules of the language. Early techniques that use pre-trained Language Models for MWP generation either require a tutor to provide the initial portion of the MWP, and/or additional information such as an equation. In this paper, we present an MWP generation system (MathWiz) based on Large Language Models (LLMs) that overcomes the need for additional input - the only input to our system is the number of MWPs needed, the grade and the type of question (e.g.~addition, subtraction). Unlike the existing LLM-based solutions for MWP generation, we carried out an extensive set of experiments involving different LLMs, prompting strategies, techniques to improve the diversity of MWPs, as well as techniques that employ human feedback to improve LLM performance. Human and automated evaluations confirmed that the generated MWPs are high in quality, with minimal spelling and grammar issues. However, LLMs still struggle to generate questions that adhere to the specified grade and question type requirements.
広告