AcTTA: Rethinking Test-Time Adaptation via Dynamic Activation

arXiv cs.LG / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that test-time adaptation (TTA) has overemphasized affine modulation of normalization layers and underexplored the role of activation functions in representation dynamics under distribution shift.
  • It proposes AcTTA, an activation-aware framework that reparameterizes common activations (e.g., ReLU, GELU) into learnable, parameterized forms that can adjust response thresholds and gradient sensitivity at test time.
  • AcTTA updates activation behavior adaptively during inference without modifying network weights and without requiring source-domain data, aiming for lightweight domain-shift robustness.
  • Experiments on CIFAR10-C, CIFAR100-C, and ImageNet-C show that AcTTA achieves robust, stable adaptation and consistently outperforms normalization-based TTA methods.
  • The results position activation adaptation as a compact alternative to the prevailing normalization-centric view, potentially broadening the design space for domain-shift-robust test-time learning.

Abstract

Test-time adaptation (TTA) aims to mitigate performance degradation under distribution shifts by updating model parameters during inference. Existing approaches have primarily framed adaptation around affine modulation, focusing on recalibrating normalization layers. This perspective, while effective, overlooks another influential component in representation dynamics: the activation function. We revisit this overlooked space and propose AcTTA, an activation-aware framework that reinterprets conventional activation functions from a learnable perspective and updates them adaptively at test time. AcTTA reformulates conventional activation functions (e.g., ReLU, GELU) into parameterized forms that shift their response threshold and modulate gradient sensitivity, enabling the network to adjust activation behavior under domain shifts. This functional reparameterization enables continuous adjustment of activation behavior without modifying network weights or requiring source data. Despite its simplicity, AcTTA achieves robust and stable adaptation across diverse corruptions. Across CIFAR10-C, CIFAR100-C, and ImageNet-C, AcTTA consistently surpasses normalization-based TTA methods. Our findings highlight activation adaptation as a compact and effective route toward domain-shift-robust test-time learning, broadening the prevailing affine-centric view of adaptation.