ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications

arXiv cs.LG / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Batch Normalization (BN) can fail in micro-batch and non-IID federated learning settings, and removing BN often triggers catastrophic training instability in deep networks.
The paper argues that common activations like Swish and ReLU worsen BN-free training because their non-zero-centered behavior causes activation mean shifts to compound with network depth.
It introduces Zero-Centered Swish (ZC-Swish), a plug-in, parameterized activation designed to keep activation means dynamically anchored near zero.
Experiments stress-testing BN-free convolutional networks at depths 8, 16, and 32 show that standard Swish collapses at depth 16+, while ZC-Swish preserves stable activation dynamics and reaches the best reported test accuracy at depth 16 (51.5% for seed 42).
The authors position ZC-Swish as a parameter-efficient way to stabilize deep learning for edge deployment and privacy-preserving applications where normalization layers are impractical.

Abstract

Batch Normalization (BN) is a cornerstone of deep learning, yet it fundamentally breaks down in micro-batch regimes (e.g., 3D medical imaging) and non-IID Federated Learning. Removing BN from deep architectures, however, often leads to catastrophic training failures such as vanishing gradients and dying channels. We identify that standard activation functions, like Swish and ReLU, exacerbate this instability in BN-free networks due to their non-zero-centered nature, which causes compounding activation mean-shifts as network depth increases. In this technical communication, we propose Zero-Centered Swish (ZC-Swish), a drop-in activation function parameterized to dynamically anchor activation means near zero. Through targeted stress-testing on BN-free convolutional networks at depths 8, 16, and 32, we demonstrate that while standard Swish collapses to near-random performance at depth 16 and beyond, ZC-Swish maintains stable layer-wise activation dynamics and achieves the highest test accuracy at depth 16 (51.5%) with seed 42. ZC-Swish thus provides a robust, parameter-efficient solution for stabilizing deep networks in memory-constrained and privacy-preserving applications where traditional normalization is unviable.

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Dev.to

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications

Key Points

Abstract

Related Articles

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer