DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity

arXiv cs.LG / 3/18/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces DynamicGate-MLP, a framework that combines regularization-like dropout with input-dependent conditional computation via learned gates to adapt computation to each input.
It defines continuous gate probabilities and, during inference, derives a discrete execution mask to select the active path, enabling sample-specific computation.
Training uses a penalty on expected gate usage and a Straight-Through Estimator to optimize the discrete mask, balancing accuracy and compute budget.
The method is evaluated on MNIST, CIFAR-10, Tiny-ImageNet, Speech Commands, and PBMC3k, comparing against MLP baselines and MoE-style variants, with compute efficiency measured via gate activation ratios and a layer-weighted MAC metric rather than wall-clock latency.

Abstract

Dropout is a representative regularization technique that stochastically deactivates hidden units during training to mitigate overfitting. In contrast, standard inference executes the full network with dense computation, so its goal and mechanism differ from conditional computation, where the executed operations depend on the input. This paper organizes DynamicGate-MLP into a single framework that simultaneously satisfies both the regularization view and the conditional-computation view. Instead of a random mask, the proposed model learns gates that decide whether to use each unit (or block), suppressing unnecessary computation while implementing sample-dependent execution that concentrates computation on the parts needed for each input. To this end, we define continuous gate probabilities and, at inference time, generate a discrete execution mask from them to select an execution path. Training controls the compute budget via a penalty on expected gate usage and uses a Straight-Through Estimator (STE) to optimize the discrete mask. We evaluate DynamicGate-MLP on MNIST, CIFAR-10, Tiny-ImageNet, Speech Commands, and PBMC3k, and compare it with various MLP baselines and MoE-style variants. Compute efficiency is compared under a consistent criterion using gate activation ratios and a layerweighted relative MAC metric, rather than wall-clock latency that depends on hardware and backend kernels.

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Dev.to

Dual-Criterion Curriculum Learning: Application to Temporal Data

arXiv cs.LG

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

arXiv cs.LG

Safe Reinforcement Learning with Preference-based Constraint Inference

arXiv cs.LG

Residual Attention Physics-Informed Neural Networks for Robust Multiphysics Simulation of Steady-State Electrothermal Energy Systems

arXiv cs.LG

DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity

Key Points

Abstract

Related Articles

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Dual-Criterion Curriculum Learning: Application to Temporal Data

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

Safe Reinforcement Learning with Preference-based Constraint Inference

Residual Attention Physics-Informed Neural Networks for Robust Multiphysics Simulation of Steady-State Electrothermal Energy Systems

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer