Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks
arXiv stat.ML / 4/28/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how activation functions beyond ReLU affect neural tangent kernels (NTK) and neural network Gaussian process kernels (NNGP), focusing on activations whose only non-smoothness occurs at zero.
- It characterizes the RKHS (reproducing kernel Hilbert space) associated with these kernels and extends existing NTK/NNGP theory to activations such as SELU, ELU, and LeakyReLU.
- The authors analyze variants and special cases including architectures with missing biases, two-layer networks, and polynomial activations.
- Results indicate that many non-infinitely-smooth activations yield equivalent RKHSs across different depths—depending mainly on the “degree” of non-smoothness—whereas polynomial activations exhibit depth-dependent RKHS behavior.
- The work also derives smoothness properties of NNGP sample paths, characterizing the smoothness of infinitely wide neural networks at initialization.


