StarDrinks: An English and Korean Test Set for SLU Evaluation in a Drink Ordering Scenario

arXiv cs.CL / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that task-oriented LLM and speech assistant evaluations often use overly controlled setups that don’t reflect the variability of real user requests.
It introduces StarDrinks, an English-and-Korean test set for a drink-ordering scenario with rich named entities, drink attributes, customizations, and brand-specific terminology.
The dataset also includes spontaneous speech phenomena like hesitations and self-corrections, aiming to better mirror natural user behavior.
StarDrinks provides annotations (slots) and supports multiple evaluation pathways, including speech-to-slots SLU, transcription-to-slots NLU, and speech-to-transcription ASR.
Overall, the benchmark is designed to assess model robustness and generalization in a linguistically complex, real-world task across speech and text modalities.

Abstract

LLMs and speech assistants are increasingly used for task-oriented interactions, yet their evaluation often relies on controlled scenarios that fail to capture the variability and complexity of real user requests. Drink ordering, for example, involves diverse named entities, drink types, sizes, customizations, and brand-specific terminology, as well as spontaneous speech phenomena such as hesitations and self-corrections. To address this gap, we introduce StarDrinks, a test set in English and Korean containing speech utterances features, transcriptions, and annotated slots. Our dataset supports speech-to-slots SLU, transcription-to-slots NLU, and speech-to-transcription ASR evaluation, providing a realistic benchmark for model robustness and generalization in a linguistically rich, real-world task.

Can AI Predict Pollution Before It Happens? The Smart Solution to an Old Problem

Dev.to

THE FIFTH TRANSMISSION: THE GRADIENT IS THE GOVERNMENT

Reddit r/artificial

Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]

Reddit r/MachineLearning

RAG Series (1): Why LLMs Need External Memory

Dev.to

One Open Source Project a Day (No. 54): Warp - The AI-Native Rust Terminal

Dev.to

StarDrinks: An English and Korean Test Set for SLU Evaluation in a Drink Ordering Scenario

Key Points

Abstract

Related Articles

Can AI Predict Pollution Before It Happens? The Smart Solution to an Old Problem

THE FIFTH TRANSMISSION: THE GRADIENT IS THE GOVERNMENT

Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]

RAG Series (1): Why LLMs Need External Memory

One Open Source Project a Day (No. 54): Warp - The AI-Native Rust Terminal

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer