Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Video outpainting extends video content beyond original frame boundaries while maintaining spatial fidelity and temporal coherence, but existing diffusion-based methods often struggle with implicit temporal modeling and insufficient spatial context.
The paper proposes Seen-to-Scene, a framework that unifies propagation-based and generation-based approaches to reduce intra-frame and inter-frame inconsistencies, especially in dynamic scenes and large outpainting regions.
Seen-to-Scene uses flow-based propagation with a flow completion network pre-trained for video inpainting, then fine-tunes it end-to-end to reconstruct coherent motion fields and bridge the domain gap.
It also introduces reference-guided latent propagation to improve the efficiency and reliability of propagating source content across frames.
Experiments report improved temporal coherence and visual realism with efficient inference, outperforming prior state-of-the-art approaches that require input-specific adaptation.

Abstract

Video outpainting aims to expand the visible content of a video beyond the original frame boundaries while preserving spatial fidelity and temporal coherence across frames. Existing methods primarily rely on large-scale generative models, such as diffusion models. However, generationbased approaches suffer from implicit temporal modeling and limited spatial context. These limitations lead to intraframe and inter-frame inconsistencies, which become particularly pronounced in dynamic scenes and large outpainting scenarios. To overcome these challenges, we propose Seen-to-Scene, a novel framework that unifies propagationbased and generation-based paradigms for video outpainting. Specifically, Seen-to-Scene leverages flow-based propagation with a flow completion network pre-trained for video inpainting, which is fine-tuned in an end-to-end manner to bridge the domain gap and reconstruct coherent motion fields. To further improve the efficiency and reliability of propagation, we introduce a reference-guided latent propagation that effectively propagates source content across frames. Extensive experiments demonstrate that our method achieves superior temporal coherence and visual realism with efficient inference, surpassing even prior state-of-the-art methods that require input-specific adaptation.

FastAPI With LangChain and MongoDB

Dev.to

[Patterns] AI Agent Error Handling That Actually Works

Dev.to

Building ONNX Embedding Workflows in Oracle AI Database with Python

Dev.to

🌱 Green Habit Tracker

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

[Patterns] AI Agent Error Handling That Actually Works

Building ONNX Embedding Workflows in Oracle AI Database with Python

🌱 Green Habit Tracker

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer