A2BFR: Attribute-Aware Blind Face Restoration

arXiv cs.CV / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes A2BFR, an attribute-aware framework for blind face restoration that combines high-fidelity reconstruction with prompt-controllable generation.
  • It uses a Diffusion Transformer with unified image-text cross-modal attention, conditioning the denoising process on both the degraded face input and a textual prompt.
  • The method introduces attribute-aware learning by supervising denoising latents with facial attribute embeddings from an attribute-aware encoder to improve semantic priors.
  • To strengthen controllability, it adds semantic dual-training based on pairwise attribute variations from a newly curated AttrFace-90K dataset, encouraging attribute discrimination while maintaining restoration fidelity.
  • Experiments report state-of-the-art results, including improved restoration quality (LPIPS reduction) and substantially higher attribute accuracy, with fine-grained instruction-following under severe degradations.

Abstract

Blind face restoration (BFR) aims to recover high-quality facial images from degraded inputs, yet its inherently ill-posed nature leads to ambiguous and uncontrollable solutions. Recent diffusion-based BFR methods improve perceptual quality but remain uncontrollable, whereas text-guided face editing enables attribute manipulation without reliable restoration. To address these issues, we propose A^2BFR, an attribute-aware blind face restoration framework that unifies high-fidelity reconstruction with prompt-controllable generation. Built upon a Diffusion Transformer backbone with unified image-text cross-modal attention, A^2BFR jointly conditions the denoising trajectory on both degraded inputs and textual prompts. To inject semantic priors, we introduce attribute-aware learning, which supervises denoising latents using facial attribute embeddings extracted by an attribute-aware encoder. To further enhance prompt controllability, we introduce semantic dual-training, which leverages the pairwise attribute variations in our newly curated AttrFace-90K dataset to enforce attribute discrimination while preserving fidelity. Extensive experiments demonstrate that A^2BFR achieves state-of-the-art performance in both restoration fidelity and instruction adherence, outperforming diffusion-based BFR baselines by -0.0467 LPIPS and +52.58% attribute accuracy, while enabling fine-grained, prompt-controllable restoration even under severe degradations.