PhysNeXt: Next-Generation Dual-Branch Structured Attention Fusion Network for Remote Photoplethysmography Measurement

arXiv cs.CV / 3/23/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • PhysNeXt proposes a dual-input framework that jointly leverages video frames and STMap representations to improve remote photoplethysmography (rPPG).
  • It incorporates a spatio-temporal difference modeling unit, a cross-modal interaction module, and a structured attention-based decoder to enhance pulse signal extraction robustness.
  • The method aims to combine the full spatiotemporal information of raw videos with the compact, lower-volume STMap representation to mitigate motion and illumination artifacts.
  • Experimental results indicate more stable and fine-grained rPPG signal recovery under challenging conditions, and the authors plan to release the code.

Abstract

Remote photoplethysmography (rPPG) enables contactless measurement of heart rate and other vital signs by analyzing subtle color variations in facial skin induced by cardiac pulsation. Current rPPG methods are mainly based on either end-to-end modeling from raw videos or intermediate spatial-temporal map (STMap) representations. The former preserves complete spatiotemporal information and can capture subtle heartbeat-related signals, but it also introduces substantial noise from motion artifacts and illumination variations. The latter stacks the temporal color changes of multiple facial regions of interest into compact two-dimensional representations, significantly reducing data volume and computational complexity, although some high-frequency details may be lost. To effectively integrate the mutual strengths, we propose PhysNeXt, a dual-input deep learning framework that jointly exploits video frames and STMap representations. By incorporating a spatio-temporal difference modeling unit, a cross-modal interaction module, and a structured attention-based decoder, PhysNeXt collaboratively enhances the robustness of pulse signal extraction. Experimental results demonstrate that PhysNeXt achieves more stable and fine-grained rPPG signal recovery under challenging conditions, validating the effectiveness of joint modeling of video and STMap representations. The codes will be released.