Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

arXiv cs.CV / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a new “blind” bitstream-corrupted video recovery setting that avoids needing manually provided corruption masks, making restoration more practical for real-world degradation.
It proposes a Metadata-Guided Diffusion Model (M-GDM) that uses intrinsic video metadata (e.g., motion vectors and frame types) via a dual-stream encoder and cross-attention at each diffusion step to identify corrupted regions and guide reconstruction.
A prior-driven mask predictor generates pseudo masks from metadata and diffusion priors, enabling separation of intact versus to-be-recovered latent regions through hard masking and recombination.
To reduce visible seams and boundary artifacts from imperfect mask estimation, the method adds a post-refinement module that improves consistency between preserved and restored areas.
Experiments reportedly show strong performance and superiority over prior blind video recovery approaches, with released code on GitHub.

Abstract

Bitstream-corrupted video recovery aims to restore realistic content degraded during video storage or transmission. Existing methods typically assume that predefined masks of corrupted regions are available, but manually annotating these masks is labor-intensive and impractical in real-world scenarios. To address this limitation, we introduce a new blind video recovery setting that removes the reliance on predefined masks. This setting presents two major challenges: accurately identifying corrupted regions and recovering content from extensive and irregular degradations. We propose a Metadata-Guided Diffusion Model (M-GDM) to tackle these challenges. Specifically, intrinsic video metadata are leveraged as corruption indicators through a dual-stream metadata encoder that separately embeds motion vectors and frame types before fusing them into a unified representation. This representation interacts with corrupted latent features via cross-attention at each diffusion step. To preserve intact regions, we design a prior-driven mask predictor that generates pseudo masks using both metadata and diffusion priors, enabling the separation and recombination of intact and recovered regions through hard masking. To mitigate boundary artifacts caused by imperfect masks, a post-refinement module enhances consistency between intact and recovered regions. Extensive experiments demonstrate the effectiveness of our method and its superiority in blind video recovery. Code is available at: https://github.com/Shuyun-Wang/M-GDM.

Black Hat Asia

AI Business

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

Dev.to

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Key Points

Abstract

Related Articles

Black Hat Asia

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer