MVRD-Bench: Multi-View Learning and Benchmarking for Dynamic Remote Photoplethysmography under Occlusion

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MVRD-Bench (MVRD), a multi-view remote photoplethysmography dataset with synchronized facial videos from three viewpoints designed to reflect real-world motion, speaking, and occlusion conditions.
  • It proposes MVRD-rPPG, a unified multi-view learning framework that fuses complementary cues to improve robustness when facial skin coverage is partially lost due to motion-induced occlusion.
  • The framework includes motion artifact suppression (ATOC), feature disentanglement via a Rhythm-Visual Dual-Stream Network, and view-wise aggregation using Multi-View Correlation-Aware Attention (MVCA).
  • It adds a Correlation Frequency Adversarial (CFA) training strategy to jointly enforce temporal accuracy, spectral consistency, and perceptual realism of the estimated physiological signals.
  • Experiments on the MVRD dataset show strong performance under movement, including MAE of 0.90 and Pearson correlation R of 0.99, and the authors state that code and the dataset will be released.

Abstract

Remote photoplethysmography (rPPG) is a non-contact technique that estimates physiological signals by analyzing subtle skin color changes in facial videos. Existing rPPG methods often encounter performance degradation under facial motion and occlusion scenarios due to their reliance on static and single-view facial videos. Thus, this work focuses on tackling the motion-induced occlusion problem for rPPG measurement in unconstrained multi-view facial videos. Specifically, we introduce a Multi-View rPPG Dataset (MVRD), a high-quality benchmark dataset featuring synchronized facial videos from three viewpoints under stationary, speaking, and head movement scenarios to better match real-world conditions. We also propose MVRD-rPPG, a unified multi-view rPPG learning framework that fuses complementary visual cues to maintain robust facial skin coverage, especially under motion conditions. Our method integrates an Adaptive Temporal Optical Compensation (ATOC) module for motion artifact suppression, a Rhythm-Visual Dual-Stream Network to disentangle rhythmic and appearance-related features, and a Multi-View Correlation-Aware Attention (MVCA) for adaptive view-wise signal aggregation. Furthermore, we introduce a Correlation Frequency Adversarial (CFA) learning strategy, which jointly enforces temporal accuracy, spectral consistency, and perceptual realism in the predicted signals. Extensive experiments and ablation studies on the MVRD dataset demonstrate the superiority of our approach. In the MVRD movement scenario, MVRD-rPPG achieves an MAE of 0.90 and a Pearson correlation coefficient (R) of 0.99. The source code and dataset will be made available.