LumiVideo: An Intelligent Agentic System for Video Color Grading

arXiv cs.AI / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • LumiVideo is an agentic system for video color grading that aims to replicate professional colorists’ workflow using a four-stage Perception → Reasoning → Execution → Reflection process.
  • Instead of directly outputting edited pixels, it analyzes raw log video and produces interpretable, industry-standard ASC-CDL parameters plus a temporally consistent 3D LUT.
  • Its Reasoning component combines an LLM’s cinematic knowledge with a RAG setup and a Tree-of-Thoughts search to explore and optimize a non-linear color parameter space.
  • An optional natural-language Reflection loop enables iterative refinement driven by creator feedback, improving controllability versus prior black-box automation.
  • The work also introduces LumiGrade, a benchmark for evaluating automated grading on log-encoded video, with results indicating near-human quality in fully automatic mode.

Abstract

Video color grading is a critical post-production process that transforms flat, log-encoded raw footage into emotionally resonant cinematic visuals. Existing automated methods act as static, black-box executors that directly output edited pixels, lacking both interpretability and the iterative control required by professionals. We introduce LumiVideo, an agentic system that mimics the cognitive workflow of professional colorists through four stages: Perception, Reasoning, Execution, and Reflection. Given only raw log video, LumiVideo autonomously produces a cinematic base grade by analyzing the scene's physical lighting and semantic content. Its Reasoning engine synergizes an LLM's internalized cinematic knowledge with a Retrieval-Augmented Generation (RAG) framework via a Tree of Thoughts (ToT) search to navigate the non-linear color parameter space. Rather than generating pixels, the system compiles the deduced parameters into industry-standard ASC-CDL configurations and a globally consistent 3D LUT, analytically guaranteeing temporal consistency. An optional Reflection loop then allows creators to refine the result via natural language feedback. We further introduce LumiGrade, the first log-encoded video benchmark for evaluating automated grading. Experiments show that LumiVideo approaches human expert quality in fully automatic mode while enabling precise iterative control when directed.