Feasible-First Exploration for Constrained ML Deployment Optimization in Crash-Prone Hierarchical Search Spaces

arXiv cs.LG / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper targets ML deployment optimization under strict production constraints by treating the problem as a hierarchical mixed-variable search space with many invalid configurations that can crash, OOM, or violate latency limits.
  • It argues that standard black-box optimizers (e.g., TPE and constrained Bayesian optimization) can waste a large share of small evaluation budgets on infeasible trials when valid regions are rare.
  • The authors propose Thermal Budget Annealing (TBA), a feasible-first approach that explicitly maps feasible regions before warm-starting TPE.
  • TBA improves robustness on hostile hardware using early trial timeouts and a subspace blacklisting mechanism that temporarily suppresses categorical subspaces after repeated failures.
  • The work also introduces DeployBench, a benchmark suite with hierarchical structure, hidden crash zones, hard constraints, and unequal evaluation costs, showing TBA’s hybrid strategy improves discovery and reduces wasted budget on both synthetic tasks and real GPU deployment across multiple GPU targets.

Abstract

Deploying machine learning models under production constraints requires joint optimization over model family, quantization scheme, runtime backend, and serving configuration. This induces a hierarchical mixed-variable search space in which many configurations are invalid: evaluations may crash, exceed memory limits, or violate latency constraints. Standard black-box optimizers such as Tree-structured Parzen Estimators (TPE) and constrained Bayesian optimization are effective when valid configurations are common, but they can spend a large fraction of a small evaluation budget on invalid or uninformative trials in hostile deployment spaces. This paper studies that regime and asks whether optimization should be decomposed into an explicit exploration stage followed by model-guided exploitation. We propose Thermal Budget Annealing (TBA), a feasible-first exploration procedure that maps valid and feasible regions before warm-starting TPE. The method includes two robustness mechanisms for hostile hardware: trial timeouts that abort clearly infeasible evaluations early, and subspace blacklisting that temporarily suppresses categorical subspaces after repeated failures. We also introduce DeployBench, a benchmark suite for deployment optimization with hierarchical structure, hidden crash zones, hard constraints, and unequal evaluation costs. On synthetic benchmarks and real GPU deployment with five pre-trained vision models across five GPU targets (NVIDIA H100, A100, RTX 5080, L4, and T4), the proposed hybrid improves model-family discovery under tight constraints while reducing wasted budget relative to cold-start TPE.