Pioneering Perceptual Video Fluency Assessment: A Novel Task with Benchmark Dataset and Baseline
arXiv cs.CV / 3/30/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that existing Video Quality Assessment (VQA) approaches often fail to capture “video fluency” well, motivating the creation of Video Fluency Assessment (VFA) as a standalone temporal perceptual task.
- It introduces a new fluency-focused benchmark dataset, FluVid, containing 4,606 in-the-wild videos with a balanced fluency distribution and new human study–based scoring criteria.
- A large-scale benchmark across 23 methods is presented to evaluate progress on FluVid and to inform VFA-specific model design choices.
- The authors propose a baseline model, FluNet, using temporal permuted self-attention (T-PSA) to better encode fluency-relevant cues and improve long-range frame interactions.
- Results indicate state-of-the-art performance on the proposed benchmark and provide a research roadmap for further exploration of VFA.
Related Articles

Black Hat Asia
AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to