[R] What kind on video benchmark is missing VLMs?

Reddit r/MachineLearning / 3/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The post notes existing video-language model benchmarks such as VideoMME, MLVU, MVBench, and LVBench.
It asks what kind of benchmark is missing for VLMs and what kind of dataset could be created to test more physical and open-world capabilities.
It suggests a benchmark direction that emphasizes real-world physicality and open-world understanding beyond current datasets.
It is authored by user Alternative_Art2984 on Reddit and links to a discussion in r/MachineLearning.

I am just curious searching out lots of benchmarks to evaluate VLMs for videos for instance VideoMME, MLVU, MVBench,LVBench and many more

I am still fingering out what is missing in terms of benchmarking VLMs? like what kind of dataset i can create to make it more physical and open world

submitted by /u/Alternative_Art2984
[link] [comments]