STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

Apple Machine Learning Journal / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes STARFlow-V, an end-to-end video generative model built on normalizing flows, revisiting a design space dominated in video by diffusion models.
  • It argues STARFlow-V can offer advantages including end-to-end learning, robust causal prediction, and native likelihood estimation for continuous video data.
  • The work is motivated by the renewed progress of normalizing flows in image generation, but highlights that video’s spatiotemporal complexity and compute cost make existing diffusion-centric approaches less directly transferable.
  • STARFlow-V is presented as a response to these challenges, aiming to bring likelihood-based modeling benefits to video generation.
  • The authors publish the study as an April 2026 paper and provide an arXiv link for the full publication details.
Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a normalizing flow-based video generator with substantial benefits such as end-to-end learning, robust causal prediction, and native likelihood estimation…

Continue reading this article on the original site.

Read original →