EmoTransCap: Dataset and Pipeline for Emotion Transition-Aware Speech Captioning in Discourses
arXiv cs.CL / 4/30/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper introduces EmoTransCap, a new paradigm and dataset aimed at emotion transition-aware speech captioning that goes beyond single, static emotion labels within isolated sentences.
- It includes an automated dataset creation pipeline to scale a large corpus specifically designed to capture emotion transitions at the discourse level.
- The work proposes the Multi-Task Emotion Transition Recognition (MTETR) model to jointly perform emotion transition detection and diarization.
- To improve training and usability, the authors use LLM-based semantic analysis to generate two annotation styles (descriptive and instruction-oriented) and also present a controllable transition-aware emotional speech synthesis system.
Related Articles

Black Hat USA
AI Business

Can AI Predict Pollution Before It Happens? The Smart Solution to an Old Problem
Dev.to
THE FIFTH TRANSMISSION: THE GRADIENT IS THE GOVERNMENT
Reddit r/artificial
Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]
Reddit r/MachineLearning

RAG Series (1): Why LLMs Need External Memory
Dev.to