Aletheia: Physics-Conditioned Localized Artifact Attention (PhyLAA-X) for End-to-End Generalizable and Robust Deepfake Video Detection

arXiv cs.CV / 4/21/2026

📰 NewsSignals & Early TrendsIndustry & Market MovesModels & Research

Key Points

  • The paper introduces PhyLAA-X (Aletheia), a physics-conditioned extension of Localized Artifact Attention aimed at more robust deepfake video detection under cross-generator shifts, compression, and adversarial attacks.
  • It injects three end-to-end differentiable physics-derived feature volumes—optical-flow curl, specular-reflectance skewness, and rPPG power spectra—into the attention mechanism via cross-attention gating and adds a resonance consistency loss to tie learning to physical invariants.
  • The approach is implemented across multiple spatiotemporal backbones in an efficient ensemble with uncertainty-aware adaptive weighting, improving detection performance across major benchmarks (FaceForensics++ c23, Celeb-DF v2, and DFDC).
  • Reported results show stronger cross-generator gains than the prior LAA-Net baseline (4.1–7.3%) and substantial adversarial robustness (79.4% accuracy under epsilon=0.02 PGD-10 attacks), with ablations confirming the standalone contribution.
  • The full production system, pretrained weights, and reproducibility/adversarial artifacts (ADC-2026) are open-sourced on GitHub (v1.2, April 2026).

Abstract

State-of-the-art deepfake detectors achieve near-perfect in-domain accuracy yet degrade under cross-generator shifts, heavy compression, and adversarial perturbations. The core limitation remains the decoupling of semantic artifact learning from physical invariants: optical-flow discontinuities, specular-reflection inconsistencies, and cardiac-modulated reflectance (rPPG) are treated either as post-hoc features or ignored. We introduce PhyLAA-X, a novel physics-conditioned extension of Localized Artifact Attention (LAA-X). PhyLAA-X injects three end-to-end differentiable physics-derived feature volumes - optical-flow curl, specular-reflectance skewness, and spatially-upsampled rPPG power spectra - directly into the LAA-X attention computation via cross-attention gating and a resonance consistency loss. This forces the network to learn manipulation boundaries where semantic inconsistencies and physical violations co-occur - regions inherently harder for generative models to replicate consistently. PhyLAA-X is embedded across an efficient spatiotemporal ensemble (EfficientNet-B4+BiLSTM, ResNeXt-101+Transformer, Xception+causal Conv1D) with uncertainty-aware adaptive weighting. On FaceForensics++ (c23), Aletheia reaches 97.2% accuracy / 0.992 AUC-ROC; on Celeb-DF v2, 94.9% / 0.981; on DFDC, 90.8% / 0.966 - outperforming the strongest published baseline (LAA-Net [1]) by 4.1-7.3% in cross-generator settings and maintaining 79.4% accuracy under epsilon = 0.02 PGD-10 attacks. Single-backbone ablations confirm PhyLAA-X alone delivers a 4.2% cross-dataset AUC gain. The full production system is open-sourced at https://github.com/devghori1264/Aletheia (v1.2, April 2026) with pretrained weights, the adversarial corpus (referred to as ADC-2026 in this work), and complete reproducibility artifacts.