Action-Aware Generative Sequence Modeling for Short Video Recommendation
arXiv cs.AI / 4/29/2026
📰 NewsIndustry & Market MovesModels & Research
Key Points
- The paper argues that conventional short-video recommender models, which treat each video as a single binary item, struggle to capture users’ differing attitudes toward diverse segments within a video over time.
- It proposes Action-Aware Generative Sequence Network (A2Gen), modeling user consumption as a temporal process where the timing and patterns of user actions reveal different intentions.
- A2Gen uses a Context-aware Attention Module (CAM) to incorporate item-specific contextual features, a Hierarchical Sequence Encoder (HSE) to learn temporal action patterns from history, and an Action-seq Autoregressive Generator (AAG) to generate action sequences.
- Experiments on Kuaishou and Tmall datasets show the approach outperforms prior methods, and large-scale Kuaishou A/B tests report significant gains in watch time, interaction rate, and 7-day retention, leading to full deployment for traffic serving 400M+ daily users.
- Overall, the work demonstrates that action-timing-aware generative sequential modeling can improve multi-task short-video recommendation in both offline and online settings.



