SMFD-UNet: Semantic Face Mask Is The Only Thing You Need To Deblur Faces

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SMFD-UNet, a lightweight UNet-based framework for facial image deblurring that uses semantic face masks to recover sharper identity- and structure-specific details from blurry inputs.
  • It follows a dual-step approach: generating detailed facial component masks (eyes, nose, mouth) directly from blurry photos, then producing the restored image via multi-stage feature fusion between the masks and the input.
  • The authors report improved performance on CelebA versus state-of-the-art methods, with higher PSNR and SSIM while maintaining naturalness metrics such as NIQE, LPIPS, and FID.
  • A randomized blurring pipeline is used to simulate ~1.74 trillion degradation scenarios to improve robustness under diverse real-world blur conditions.
  • Architectural choices like residual dense convolution blocks, attention (CBAM), and efficient upsampling/post-processing are emphasized to keep the method scalable and computationally efficient.

Abstract

For applications including facial identification, forensic analysis, photographic improvement, and medical imaging diagnostics, facial image deblurring is an essential chore in computer vision allowing the restoration of high-quality images from blurry inputs. Often based on general picture priors, traditional deblurring techniques find it difficult to capture the particular structural and identity-specific features of human faces. We present SMFD-UNet (Semantic Mask Fusion Deblurring UNet), a new lightweight framework using semantic face masks to drive the deblurring process, therefore removing the need for high-quality reference photos in order to solve these difficulties. First, our dual-step method uses a UNet-based semantic mask generator to directly extract detailed facial component masks (e.g., eyes, nose, mouth) straight from blurry photos. Sharp, high-fidelity facial images are subsequently produced by integrating these masks with the blurry input using a multi-stage feature fusion technique within a computationally efficient UNet framework. We created a randomized blurring pipeline that roughly replicates real-world situations by simulating around 1.74 trillion deterioration scenarios, hence guaranteeing resilience. Examined on the CelebA dataset, SMFD-UNet shows better performance than state-of-the-art models, attaining higher Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) while preserving satisfactory naturalness measures, including NIQE, LPIPS, and FID. Powered by Residual Dense Convolution Blocks (RDC), a multi-stage feature fusion strategy, efficient and effective upsampling techniques, attention techniques like CBAM, post-processing techniques, and the lightweight design guarantees scalability and efficiency, enabling SMFD-UNet to be a flexible solution for developing facial image restoration research and useful applications.