AI Navigate

[R] What kind on video benchmark is missing VLMs?

Reddit r/MachineLearning / 3/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The post notes existing video-language model benchmarks such as VideoMME, MLVU, MVBench, and LVBench.
  • It asks what kind of benchmark is missing for VLMs and what kind of dataset could be created to test more physical and open-world capabilities.
  • It suggests a benchmark direction that emphasizes real-world physicality and open-world understanding beyond current datasets.
  • It is authored by user Alternative_Art2984 on Reddit and links to a discussion in r/MachineLearning.

I am just curious searching out lots of benchmarks to evaluate VLMs for videos for instance VideoMME, MLVU, MVBench,LVBench and many more

I am still fingering out what is missing in terms of benchmarking VLMs? like what kind of dataset i can create to make it more physical and open world

submitted by /u/Alternative_Art2984
[link] [comments]