ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that standard raw-token reinforcement learning for end-to-end full-duplex speech language models can harm temporal dynamics, leading to semantic degradation, repetition, and generative collapse.
It introduces ASPIRin, an interactivity-optimized RL framework that explicitly separates timing control (when to speak vs. when to remain silent) from content generation (what to say).
ASPIRin uses Action Space Projection to convert the text vocabulary into a coarse-grained binary state representing active speech vs. inactive silence, then applies Group Relative Policy Optimization (GRPO) with rule-based rewards.
Empirical results indicate ASPIRin improves interactivity across turn-taking, backchanneling, and pause handling while substantially reducing duplicate n-grams by over 50% versus standard GRPO.
The key takeaway is that isolating timing from token selection preserves semantic coherence and mitigates degenerative repetition behavior.

Abstract

End-to-end full-duplex Speech Language Models (SLMs) require precise turn-taking for natural interaction. However, optimizing temporal dynamics via standard raw-token reinforcement learning (RL) degrades semantic quality, causing severe generative collapse and repetition. We propose ASPIRin, an interactivity-optimized RL framework that explicitly decouples when to speak from what to say. Using Action Space Projection, ASPIRin maps the text vocabulary into a coarse-grained binary state (active speech vs. inactive silence). By applying Group Relative Policy Optimization (GRPO) with rule-based rewards, it balances user interruption and response latency. Empirical evaluations show ASPIRin optimizes interactivity across turn-taking, backchanneling, and pause handling. Crucially, isolating timing from token selection preserves semantic coherence and reduces the portion of duplicate n-grams by over 50% compared to standard GRPO, effectively eliminating degenerative repetition.

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

Reddit r/artificial

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

🌱 Green Habit Tracker

Dev.to

ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

Key Points

Abstract

Related Articles

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

🌱 Green Habit Tracker

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer