Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness

arXiv cs.LG / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that adversarial robustness is strongly influenced by activation curvature, measured as the maximum absolute second derivative of the activation function, max|σ''|.
It introduces the Recursive Curvature-Tunable Activation Family (RCT-AF) with parameters (α, β) to precisely control curvature and study its effect on robustness.
The authors identify a non-monotonic trade-off: too little curvature reduces expressivity, while too much curvature increases the normalized Hessian diagonal norm, producing sharper minima that can degrade robust generalization.
Across multiple architectures, datasets, and adversarial training methods, the best adversarial robustness consistently occurs when max|σ''| is in the range 4–10.
The study provides theoretical explanations linking activation curvature to Hessian diagonal elements and verifies experimentally that the normalized Hessian diagonal norm shows a U-shaped dependence with its minimum in the robustness-optimal region.

Abstract

This work investigates the critical role of activation function curvature -- quantified by the maximum second derivative

\max|\sigma''|

-- in adversarial robustness. Using the Recursive Curvature-Tunable Activation Family (RCT-AF), which enables precise control over curvature through parameters

\alpha

and

\beta

, we systematically analyze this relationship. Our study reveals a fundamental trade-off: insufficient curvature limits model expressivity, while excessive curvature amplifies the normalized Hessian diagonal norm of the loss, leading to sharper minima that hinder robust generalization. This results in a non-monotonic relationship where optimal adversarial robustness consistently occurs when

\max|\sigma''|

falls within 4 to 10, a finding that holds across diverse network architectures, datasets, and adversarial training methods. We provide theoretical insights into how activation curvature affects the diagonal elements of the hessian matrix of the loss, and experimentally demonstrate that the normalized Hessian diagonal norm exhibits a U-shaped dependence on

\max|\sigma''|

, with its minimum within the optimal robustness range, thereby validating the proposed mechanism.

AgentDesk vs Hiring Another Consultant: A Cost Comparison

Dev.to

"Why Your AI Agent Needs a System 1"

Dev.to

When should we expect TurboQuant?

Reddit r/LocalLLaMA

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

Dev.to

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Dev.to

Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness

Key Points

Abstract

Related Articles

AgentDesk vs Hiring Another Consultant: A Cost Comparison

"Why Your AI Agent Needs a System 1"

When should we expect TurboQuant?

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer