Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners

arXiv cs.AI / 4/27/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The paper proposes a proficiency-aligned framework that adapts LLM-generated spoken dialogues to K-12 non-native English learners, addressing problems caused by mismatches between model output and learner ability.
It introduces a four-tier grading approach to precisely control lexical complexity, tailored to China’s national curriculum (CSE) as an example but intended to be adaptable to other educational standards.
The core contribution is the DDPO algorithm (Diversity Driven Policy Optimization), a multi-turn, GRPO-based method aimed at maintaining dialogue diversity while jointly improving dialogue quality.
Experiments report lower out-of-vocabulary rates and higher diversity, alongside improvements in conversational naturalness and pedagogical usefulness.
The authors plan to open-source the models, data, and code, providing resources such as graded vocabulary lists and a multi-turn dialogue corpus for personalized English speaking practice.

Abstract

Large language models (LLMs) often fail to meet the pedagogical needs of K-12 English learners in non-native contexts due to a proficiency mismatch. To address this widespread challenge, we introduce a proficiency-aligned framework that adapts LLM outputs to learner abilities, using China's national curriculum (CSE) as a representative case. Our framework enables precise control over lexical complexity through a four-tier grading system, supported by a comprehensive suite of new resources: graded vocabulary lists and a multi-turn dialogue corpus. Our core technical contribution is the \textbf{DDPO} algorithm,Diversity Driven Policy Optimization, a multi-turn GRPO-based approach designed to preserve dialogue diversity while holistically optimizing dialogue quality. This method significantly outperforms conventional approaches, achieving low out-of-vocabulary rates and high diversity while enhancing conversational naturalness and pedagogical value. While grounded in the CSE, our framework is designed for flexibility and can be readily adapted to other educational standards. Our models, data, and code will all be open-sourced, providing a scalable platform for personalized English speaking practice that effectively addresses the unique challenges faced by K-12 learners in non-immersive environments.