Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions
arXiv cs.CL / 4/21/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper argues that deployed spoken language models struggle to distinguish third-party interruptions from a primary user’s speech, causing context-dependent failures.
- It introduces TPI-Train, an 88K-instance dataset using speaker-aware hard negatives to prioritize acoustic cues for interruption handling.
- It also proposes TPI-Bench, an evaluation framework to rigorously test both the interruption-handling strategy and accurate speaker discrimination in deceptive scenarios.
- Experimental results indicate the dataset design reduces semantic shortcut learning, helping models rely on acoustic signals rather than only semantic context.
- The authors provide public code for the evaluation framework, aiming to support more robust multi-party spoken interaction.
Related Articles

A practical guide to getting comfortable with AI coding tools
Dev.to

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

🚀 Major BrowserAct CLI Update
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to