GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization
arXiv cs.CV / 3/27/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents GDPO-Listener, a new framework for generating expressive 3D head motion in dyadic virtual human interactions, especially improving “listener” motion realism.
- It uses an Auto-Regressive Flow Matching architecture to enable stable supervised learning for head-motion generation.
- To address listener “regression-to-the-mean” and static-face collapse, the method applies Group reward-Decoupled Policy Optimization (GDPO) that separates reward normalization across FLAME parameter groups to encourage high-variance expressive motion.
- The approach also supports explicit semantic text control, allowing customized responses aligned with provided text.
- Experiments on Seamless Interaction and DualTalk datasets show improved performance over baselines in long-term kinematic variance, visual expressivity, and semantic controllability.
Related Articles
I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial
Dev.to
The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage
Dev.to
AI 自主演化的時代來臨:從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage
Dev.to
Neural Networks in Mobile Robot Motion
Dev.to
Retraining vs Fine-tuning or Transfer Learning? [D]
Reddit r/MachineLearning