VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects
arXiv cs.CL / 4/20/2026
📰 NewsModels & Research
Key Points
- The paper introduces VEFX-Dataset, a human-annotated large-scale dataset for instruction-guided video editing with 5,049 examples across multiple categories and quality labels in three decoupled dimensions.
- It proposes VEFX-Reward, a dedicated reward model that evaluates video editing quality by jointly analyzing the source video, the editing instruction, and the edited result.
- The work releases VEFX-Bench, a curated benchmark of 300 video-prompt pairs to enable standardized comparisons between different video editing systems.
- Experiments indicate VEFX-Reward better matches human judgments than generic vision-language model judges and prior reward models, and it helps benchmark both commercial and open-source editors.
- Benchmark results reveal a continuing gap in current systems between visual plausibility, instruction-following, and edit locality (locality/edit exclusivity).
Related Articles
Awesome Open-Weight Models: The Practitioner's Guide to Open-Source LLMs (2026 Edition) [P]
Reddit r/MachineLearning

The Mythos vs GPT-5.4-Cyber debate is missing the benchmark
Dev.to

Beyond the Crop: Automating "Ghost Mannequin" Effects with Depth-Aware Inpainting
Dev.to

A Claude Code hook that warns you before calling a low-trust MCP server
Dev.to

Waiting Qwen3.6-27B I have no nails left...
Reddit r/LocalLLaMA