Understanding Teacher Revisions of Large Language Model-Generated Feedback

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how 117 teachers revise large-language-model (LLM) formative feedback, analyzing 1,349 paired instances of AI-generated feedback and teacher-edited explanations.
  • Teachers accept AI feedback unchanged about 80% of the time, while edits are typically lengthier in draft form and then later shortened by teachers.
  • Revision behavior is highly uneven across teachers: roughly half never edit, and only about 10% edit more than two-thirds of instances.
  • A model using only the AI feedback text can predict whether teachers will revise with fair accuracy (AUC = 0.75), suggesting revision signals are detectable from the original text.
  • When teachers do revise, they often simplify the AI feedback, shifting it from high-information explanations toward more concise, corrective feedback that better matches teacher priorities.

Abstract

Large language models (LLMs) increasingly generate formative feedback for students, yet little is known about how teachers revise this feedback before it reaches learners. Teachers' revisions shape what students receive, making revision practices central to evaluating AI classroom tools. We analyze a dataset of 1,349 instances of AI-generated feedback and corresponding teacher-edited explanations from 117 teachers. We examine (i) textual characteristics associated with teacher revisions, (ii) whether revision decisions can be predicted from the AI feedback text, and (iii) how revisions change the pedagogical type of feedback delivered. First, we find that teachers accept AI feedback without modification in about 80% of cases, while edited feedback tends to be significantly longer and subsequently shortened by teachers. Editing behavior varies substantially across teachers: about 50% never edit AI feedback, and only about 10% edit more than two-thirds of feedback instances. Second, machine learning models trained only on the AI feedback text as input features, using sentence embeddings, achieve fair performance in identifying which feedback will be revised (AUC=0.75). Third, qualitative coding shows that when revisions occur, teachers often simplify AI-generated feedback, shifting it away from high-information explanations toward more concise, corrective forms. Together, these findings characterize how teachers engage with AI-generated feedback in practice and highlight opportunities to design feedback systems that better align with teacher priorities while reducing unnecessary editing effort.