AI Navigate

DynamicGate MLP Conditional Computation via Learned Structural Dropout and Input Dependent Gating for Functional Plasticity

arXiv cs.LG / 3/18/2026

📰 NewsModels & Research

Key Points

  • The paper introduces DynamicGate-MLP, a framework that combines regularization-like dropout with input-dependent conditional computation via learned gates to adapt computation to each input.
  • It defines continuous gate probabilities and, during inference, derives a discrete execution mask to select the active path, enabling sample-specific computation.
  • Training uses a penalty on expected gate usage and a Straight-Through Estimator to optimize the discrete mask, balancing accuracy and compute budget.
  • The method is evaluated on MNIST, CIFAR-10, Tiny-ImageNet, Speech Commands, and PBMC3k, comparing against MLP baselines and MoE-style variants, with compute efficiency measured via gate activation ratios and a layer-weighted MAC metric rather than wall-clock latency.

Abstract

Dropout is a representative regularization technique that stochastically deactivates hidden units during training to mitigate overfitting. In contrast, standard inference executes the full network with dense computation, so its goal and mechanism differ from conditional computation, where the executed operations depend on the input. This paper organizes DynamicGate-MLP into a single framework that simultaneously satisfies both the regularization view and the conditional-computation view. Instead of a random mask, the proposed model learns gates that decide whether to use each unit (or block), suppressing unnecessary computation while implementing sample-dependent execution that concentrates computation on the parts needed for each input. To this end, we define continuous gate probabilities and, at inference time, generate a discrete execution mask from them to select an execution path. Training controls the compute budget via a penalty on expected gate usage and uses a Straight-Through Estimator (STE) to optimize the discrete mask. We evaluate DynamicGate-MLP on MNIST, CIFAR-10, Tiny-ImageNet, Speech Commands, and PBMC3k, and compare it with various MLP baselines and MoE-style variants. Compute efficiency is compared under a consistent criterion using gate activation ratios and a layerweighted relative MAC metric, rather than wall-clock latency that depends on hardware and backend kernels.