Netflix - yes Netflix - jumps on the AI bandwagon with video editor

The Register / 4/4/2026

📰 NewsTools & Practical UsageIndustry & Market MovesModels & Research

Key Points

  • Netflix is rolling out an AI video editor built on a “video-language model” that changes how edited scenes and object interactions are handled when elements are removed.
  • The model’s approach focuses on revising scene understanding and object interaction dynamics to produce more coherent results during edits.
  • The update signals Netflix’s broader move to integrate generative AI into video creation and post-production workflows.
  • The development highlights growing competition to apply multimodal AI (video + language) to practical media-editing tasks rather than only content analysis or generation.

Netflix - yes Netflix - jumps on the AI bandwagon with video editor

Video-language model revises how objects interact when things get removed from a scene

Fri 3 Apr 2026 // 20:42 UTC

A new Netflix model promises to rewrite the way we make movies. Just imagine this. As the director of the multi-million dollar epic Car Crash III: Suddenest Impact, you've just finished filming the finale where your star, Cruz Control, drives straight into an onrushing semi.

The collision is spectacular. Cruz's car – operated remotely – explodes on impact, scattering debris across the highway. It's glorious. You high-five Cruz, moping beside you at the camera monitor station as his lucrative franchise career concludes, and head to the craft services truck.

Your producer, Maya Cash, grabs you by the shoulder. "You're not going to want to hear this," she says. "But what if Cruz just drives away into the sunset. What if he doesn't die after all?"

You pause and look at her over the rims of your Balenciaga sunglasses. "They're going to fund number four after all?"

Netflix's VOID model was made for that moment. Instead of reshooting the scene or redoing it entirely with computer graphics, you can just transform the crash footage into an open road denouement.

VOID stands for Video Object and Interaction Deletion. It's a VLM (vision-language model) that can not only erase objects from a scene but can also inpaint how remaining objects in the scene should behave without the influence of whatever was excised.

It can turn, for example, a head-on collision between two vehicles into a scene of a single vehicle driving down the road by removing one and generating video depicting the physically plausible path of the remaining vehicle. Post-impact debris, smoke, and flames – all erased and replaced with pristine pavement.

The video model's creators – Saman Motamed (Netflix/Sofia University), William Harvey (Netflix), Benjamin Klein (Netflix), Luc Van Gool (Sofia University), Zhuoning Yuan (Netflix), and Ta-Ying Cheng (Netflix) – describe VOID in a preprint paper [PDF] as "a video object removal framework designed to perform physically-plausible inpainting in these complex scenarios."

It can remove objects and model how remaining objects would behave in the absence of removed objects. So given a scene of a person jumping into a pool and splashing water on the ground, VOID could remove that person and generate video that would make the pool appear undisturbed, with no splash in the pool or on the ground.

VOID isn't limited to Netflix productions alone. The company has made its model available on Hugging Face, where anyone can install it.

There are other tools for altering video, such as Runway, Generative Omnimatte, DiffuEraser, ROSE, MiniMax-Remover, and ProPainter. The Netflix boffins, however, claim VOID outperforms these alternatives substantially. Based on a survey of 25 people across multiple scenarios, VOID was preferred 64.8 percent of the time, with Runway coming in a distant second at 18.4 percent.

"Through extensive evaluations against inpainting and text-guided video model baselines on synthetic and real-world data, we show that VOID excels at modeling complex dynamics which can follow on from object removal," the authors claim.

Whether the world really needs more convincing video manipulation is another question. ®

More like these
×

More about

More like these
×

TIP US OFF

Send us news