UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs
arXiv cs.CV / 4/20/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces UniEditBench, a unified benchmark designed to fairly evaluate both image and video editing models under a shared protocol across different paradigms.
- It defines detailed taxonomy and operation coverage—nine image operations and eight video operations—including challenging tasks like counting and spatial reordering.
- Because existing automatic metrics often diverge from human preferences and directly using large multimodal LLM evaluators is too costly, the authors distill a high-capacity MLLM judge into smaller 4B/8B evaluators.
- The distilled evaluators deliver multi-dimensional scoring (e.g., structural fidelity, text alignment, background consistency, naturalness, and temporal-spatial consistency for videos) and show strong agreement with human judgments while greatly reducing evaluation cost.
- UniEditBench and the associated reward models are released publicly for reproducible benchmarking of modern visual editing methods.
Related Articles
Which Version of Qwen 3.6 for M5 Pro 24g
Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial