Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • This study evaluates Test-Time Adaptation (TTA) for facial expression recognition (FER) under natural cross-dataset distribution shifts, addressing real-world domain changes beyond synthetic corruptions.
  • It conducts cross-dataset FER experiments to assess how different collection protocols, annotation standards, and demographics affect performance.
  • Results show TTA can boost FER performance under natural shifts by up to 11.34%, with entropy-minimization methods like TENT and SAR performing best when the target distribution is clean.
  • Different TTA families excel under different conditions: prototype adjustment methods like T3A under larger distributional distances, and feature-alignment methods like SHOT yielding the largest gains when targets are noisier; overall effectiveness depends on distributional distance and shift severity.

Abstract

Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.