Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

arXiv cs.AI / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a way to train open-weight “programming learner” language models by converting real student temporal log traces into a dialogue-style conversational serialization.
In the serialized format, alternating turns capture student code submissions and automated assessment/environment feedback (tests, grades, and error traces) to teach models iterative debugging behavior.
The training pipeline combines supervised fine-tuning with preference optimization to better align the learner model’s responses with authentic student debugging patterns.
Experiments on Qwen models (4B and 8B) trained with real Python assignment submission data show that including environment feedback improves functional alignment and code similarity versus code-only and prompted LLM baselines.
The authors release code to support reproducibility and to reduce reliance on proprietary prompting approaches for large-scale tutoring strategy evaluation.

Abstract

Artificial models that simulate how learners act and respond within educational systems are a promising tool for evaluating tutoring strategies and feedback mechanisms at scale. However, many existing approaches in programming education rely on prompting large, proprietary language models, raising concerns around privacy, cost, and dependence. In this work, we propose a method for training open-weight artificial programming learners using authentic student process data. Our approach serializes temporal log traces into a conversational format, representing each student's problem-solving process as a dialogue between the learner and their automated assessment system. Student code submissions and environment feedback, such as test outcomes, grades, and error traces, form alternating conversational turns, enabling models to learn from the iterative debugging process. We additionally introduce a training pipeline combining supervised fine-tuning with preference optimization to align models with authentic student debugging behavior. We evaluate our framework by training Qwen models at 4B and 8B scales on a large-scale dataset of real student submissions to Python programming assignments. Our results show that incorporating environment feedback strengthens the models' ability to replicate student debugging behavior, improving over both prior code-only approaches and prompted large language models baselines in functional alignment and code similarity. We release our code to support reproducibility.