SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation
arXiv cs.CL / 4/23/2026
📰 NewsModels & Research
Key Points
- The paper introduces SpeechParaling-Bench, a new benchmark to evaluate paralinguistic-aware speech generation in large audio-language models (LALMs), addressing gaps in feature coverage and evaluation subjectivity.
- It expands paralinguistic feature coverage from under 50 to over 100 fine-grained features and provides 1,000+ English–Chinese parallel speech queries organized into three escalating tasks (fine-grained control, intra-utterance variation, and context-aware adaptation).
- For more reliable assessment, the authors build a pairwise comparison pipeline where an LALM-based judge compares candidate responses against a fixed baseline, using relative preference to reduce subjectivity and human annotation cost.
- Experiments show that current LALMs have major weaknesses: even strong proprietary models struggle with comprehensive static control and dynamic modulation of paralinguistic features, and misinterpreting paralinguistic cues explains 43.3% of errors in situational dialogue.
- The results highlight the need for more robust paralinguistic modeling to build voice assistants that better align with human communication behavior.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

GPT Image 2 vs DALL-E 3: What Actually Changed in OpenAI's New Image Model
Dev.to

AI Tutor for Science Students — Physics Chemistry Biology Solved by AI
Dev.to