StarDrinks: An English and Korean Test Set for SLU Evaluation in a Drink Ordering Scenario
arXiv cs.CL / 4/30/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that task-oriented LLM and speech assistant evaluations often use overly controlled setups that don’t reflect the variability of real user requests.
- It introduces StarDrinks, an English-and-Korean test set for a drink-ordering scenario with rich named entities, drink attributes, customizations, and brand-specific terminology.
- The dataset also includes spontaneous speech phenomena like hesitations and self-corrections, aiming to better mirror natural user behavior.
- StarDrinks provides annotations (slots) and supports multiple evaluation pathways, including speech-to-slots SLU, transcription-to-slots NLU, and speech-to-transcription ASR.
- Overall, the benchmark is designed to assess model robustness and generalization in a linguistically complex, real-world task across speech and text modalities.
Related Articles

Can AI Predict Pollution Before It Happens? The Smart Solution to an Old Problem
Dev.to
THE FIFTH TRANSMISSION: THE GRADIENT IS THE GOVERNMENT
Reddit r/artificial
Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]
Reddit r/MachineLearning

RAG Series (1): Why LLMs Need External Memory
Dev.to

One Open Source Project a Day (No. 54): Warp - The AI-Native Rust Terminal
Dev.to