When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools

arXiv cs.CL / 3/26/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper argues that expert-only assessments of teacher–child interaction in Chinese preschools are too costly for continuous quality monitoring at scale, limiting timely interventions.
  • It presents TEPE-TCI-370h, a new large-scale dataset of naturalistic preschool interactions (370 hours across 105 classrooms) with standardized annotations for quality evaluation.
  • The authors introduce Interaction2Eval, an LLM-based framework designed for early childhood assessment that tackles Mandarin speech and rubric-based reasoning challenges and reports performance up to 88% agreement with human experts.
  • In validation across 43 classrooms, the system reportedly achieved an 18x efficiency improvement, enabling a shift from infrequent expert audits to monthly AI-assisted monitoring with human oversight.
  • The work positions AI-augmented, continuous assessment as a pathway toward more scalable and equitable systemic improvement in early childhood education.

Abstract

High-quality teacher-child interaction (TCI) is fundamental to early childhood development, yet traditional expert-based assessment faces a critical scalability challenge. In large systems like China's-serving 36 million children across 250,000+ kindergartens-the cost and time requirements of manual observation make continuous quality monitoring infeasible, relegating assessment to infrequent episodic audits that limit timely intervention and improvement tracking. In this paper, we investigate whether AI can serve as a scalable assessment teammate by extracting structured quality indicators and validating their alignment with human expert judgments. Our contributions include: (1) TEPE-TCI-370h (Tracing Effective Preschool Education), the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and SSTEW annotations; (2) We develop Interaction2Eval, a specialized LLM-based framework addressing domain-specific challenges-child speech recognition, Mandarin homophone disambiguation, and rubric-based reasoning-achieving up to 88% agreement; (3) Deployment validation across 43 classrooms demonstrating an 18x efficiency gain in the assessment workflow, highlighting its potential for shifting from annual expert audits to monthly AI-assisted monitoring with targeted human oversight. This work not only demonstrates the technical feasibility of scalable, AI-augmented quality assessment but also lays the foundation for a new paradigm in early childhood education-one where continuous, inclusive, AI-assisted evaluation becomes the engine of systemic improvement and equitable growth.