Explicit Dropout: Deterministic Regularization for Transformer Architectures
arXiv cs.LG / 4/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes an “explicit dropout” formulation that turns dropout from stochastic masking into a deterministic additive regularizer embedded directly in the training loss.
- It derives explicit regularization terms for Transformer components (attention Q/K/V and feed-forward layers), with independently tunable strengths.
- Experiments on image classification, temporal action detection, and audio classification indicate that explicit dropout can match or outperform conventional implicit (stochastic) dropout methods.
- Ablation results show that performance remains stable and that regularization strength is controllable via regularization coefficients and dropout rates.
- The approach aims to provide a more interpretable, fine-grained alternative to stochastic regularization while preserving architectural flexibility.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Why use an AI gateway at all?
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to