Explicit Dropout: Deterministic Regularization for Transformer Architectures

arXiv cs.LG / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes an “explicit dropout” formulation that turns dropout from stochastic masking into a deterministic additive regularizer embedded directly in the training loss.
It derives explicit regularization terms for Transformer components (attention Q/K/V and feed-forward layers), with independently tunable strengths.
Experiments on image classification, temporal action detection, and audio classification indicate that explicit dropout can match or outperform conventional implicit (stochastic) dropout methods.
Ablation results show that performance remains stable and that regularization strength is controllable via regularization coefficients and dropout rates.
The approach aims to provide a more interpretable, fine-grained alternative to stochastic regularization while preserving architectural flexibility.

Abstract

Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network layers. Ablation studies demonstrate stable performance and controllable regularization through regularization coefficients and dropout rates. Overall, explicit dropout offers a practical and interpretable alternative to stochastic regularization while maintaining architectural flexibility across diverse tasks.