ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications

arXiv cs.LG / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Batch Normalization (BN) can fail in micro-batch and non-IID federated learning settings, and removing BN often triggers catastrophic training instability in deep networks.
  • The paper argues that common activations like Swish and ReLU worsen BN-free training because their non-zero-centered behavior causes activation mean shifts to compound with network depth.
  • It introduces Zero-Centered Swish (ZC-Swish), a plug-in, parameterized activation designed to keep activation means dynamically anchored near zero.
  • Experiments stress-testing BN-free convolutional networks at depths 8, 16, and 32 show that standard Swish collapses at depth 16+, while ZC-Swish preserves stable activation dynamics and reaches the best reported test accuracy at depth 16 (51.5% for seed 42).
  • The authors position ZC-Swish as a parameter-efficient way to stabilize deep learning for edge deployment and privacy-preserving applications where normalization layers are impractical.

Abstract

Batch Normalization (BN) is a cornerstone of deep learning, yet it fundamentally breaks down in micro-batch regimes (e.g., 3D medical imaging) and non-IID Federated Learning. Removing BN from deep architectures, however, often leads to catastrophic training failures such as vanishing gradients and dying channels. We identify that standard activation functions, like Swish and ReLU, exacerbate this instability in BN-free networks due to their non-zero-centered nature, which causes compounding activation mean-shifts as network depth increases. In this technical communication, we propose Zero-Centered Swish (ZC-Swish), a drop-in activation function parameterized to dynamically anchor activation means near zero. Through targeted stress-testing on BN-free convolutional networks at depths 8, 16, and 32, we demonstrate that while standard Swish collapses to near-random performance at depth 16 and beyond, ZC-Swish maintains stable layer-wise activation dynamics and achieves the highest test accuracy at depth 16 (51.5%) with seed 42. ZC-Swish thus provides a robust, parameter-efficient solution for stabilizing deep networks in memory-constrained and privacy-preserving applications where traditional normalization is unviable.