Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization
arXiv cs.CL / 3/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a mismatch between ASR-trained encoders and text-based LLMs that makes Japanese SpeechLLMs output written-style text unsuitable for natural speech synthesis.
- It proposes a preference-based alignment approach to produce concise, conversational outputs that are readily synthesized as natural speech.
- The authors introduce SpokenElyza, a Japanese speech-worthiness benchmark derived from ELYZA-tasks-100 with auditory verification by native experts.
- Experiments show substantial improvement on SpokenElyza while largely preserving performance on the original written-style evaluation.
- They plan to release SpokenElyza to support future research in Japanese spoken dialog systems.
Related Articles
State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.
Dev.to
Data Augmentation Using GANs
Dev.to
Building Safety Guardrails for LLM Customer Service That Actually Work in Production
Dev.to

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)
Dev.to

The Digital Paralegal: Amplifying Legal Teams with a Copilot Co-Worker
Dev.to