Frames2Residual: Spatiotemporal Decoupling for Self-Supervised Video Denoising
arXiv cs.CV / 3/12/2026
💬 OpinionModels & Research
Key Points
- The paper introduces Frames2Residual (F2R), a self-supervised video denoising framework that decouples spatiotemporal training into two stages: blind temporal consistency modeling and non-blind spatial texture recovery.
- Stage 1 uses a frame-wise blind temporal estimator to learn inter-frame consistency and produce a temporally stable anchor without relying on center-pixel masking.
- Stage 2 employs a non-blind spatial refiner that uses the temporal anchor to safely reintroduce the center frame and recover high-frequency spatial residuals while preserving temporal stability.
- Experimental results show that this decoupled approach enables F2R to outperform existing self-supervised methods on both sRGB and raw video benchmarks, highlighting the effectiveness of spatiotemporal decoupling in video denoising.
Related Articles

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both
THE DECODER
Kreuzberg v4.5.0: We loved Docling's model so much that we gave it a faster engine
Reddit r/LocalLLaMA
Today, what hardware to get for running large-ish local models like qwen 120b ?
Reddit r/LocalLLaMA
Running mistral locally for meeting notes and it's honestly good enough for my use case
Reddit r/LocalLLaMA
[D] Single-artist longitudinal fine art dataset spanning 5 decades now on Hugging Face — potential applications in style evolution, figure representation, and ethical training data
Reddit r/MachineLearning