PolySLGen: Online Multimodal Speaking-Listening Reaction Generation in Polyadic Interaction
arXiv cs.CV / 4/10/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- PolySLGen is an arXiv-announced online framework for generating human-like multimodal reaction behaviors (speech plus body motion and speaking-state) for a target participant within polyadic group interactions.
- The approach takes past conversation history and motion from all participants to produce temporally coherent future reactions, explicitly addressing limitations of prior work that focused on single-modality or speaking-only dyadic settings.
- To better capture group dynamics, PolySLGen introduces a pose-fusion module and a social-cue encoder that jointly aggregate motion and social signals across the group.
- Experiments with quantitative and qualitative evaluations indicate PolySLGen improves motion quality, motion–speech alignment, speaking-state prediction, and overall human-perceived realism compared with adapted and state-of-the-art baselines.
Related Articles

Black Hat Asia
AI Business

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works
Dev.to
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial