Detecting Alarming Student Verbal Responses using Text and Audio Classifier

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new hybrid framework for identifying troubled students in Automated Verbal Response Scoring (AVRS) by combining text and audio classification.
  • A text classifier is trained to detect concerning responses based on their content, while an audio classifier focuses on prosodic markers such as tone and speech patterns.
  • By jointly using content and prosody, the method aims to address shortcomings of traditional AVRS systems and improve detection performance.
  • The system is intended to speed up human review, enabling faster intervention when timely action could be life-saving.
  • The work is presented as an arXiv preprint (arXiv:2604.16717v1), highlighting an early research contribution to educational safety and monitoring.

Abstract

This paper addresses a critical safety gap in the use Automated Verbal Response Scoring (AVRS). We present a novel hybrid framework for troubled student detection that combines a text classifier, trained to detect responses based on their content, and an audio classifier, trained to detect responses using prosodic markers. This approach overcomes key limitations of traditional AVRS systems by considering both content and prosody of responses, achieving enhanced performance in identifying potentially concerning responses. This system can expedite the review process by humans, which can be life-saving particularly when timely intervention may be crucial.