ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

arXiv cs.LG / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • ALMAB-DC is a GP-based sequential experimental design framework for expensive, gradient-free black-box optimization that combines active learning, multi-armed bandits, and distributed asynchronous execution.
  • It uses a Gaussian-process surrogate with uncertainty-aware acquisition to select informative query points, while a UCB/Thompson-sampling bandit controller allocates evaluation budgets across parallel workers.
  • An asynchronous scheduler is designed to handle heterogeneous runtimes and improve practical throughput in distributed settings.
  • The paper reports cumulative regret bounds for the bandit components and characterizes parallel scalability using Amdahl’s Law, supported by measured speedups up to 7.5× with 16 agents.
  • Experiments on five benchmarks show improved simple/regret and performance over baselines across both statistical design tasks and ML/engineering applications, including 93.4% CIFAR-10 accuracy, lower CFD airfoil drag, and a 50% RL return gain versus Grid Search.

Abstract

Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at K=4 the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to C_D = 0.059 (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney U tests. Distributed execution achieves 7.5\times speedup at K = 16 agents, consistent with Amdahl's Law.