Hierarchical adaptive control for real-time dynamic inference at the edge

arXiv cs.LG / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper addresses the difficulty of deploying dynamic ML models on heterogeneous edge devices where latency, energy, and memory constraints are strict.
  • It proposes a two-tier hierarchical adaptive control system: a global scheduler that builds a per-node cascade of lightweight specialized classifiers plus a generalist fallback, and a local node controller that reacts to data drift and hardware-resource changes.
  • By enabling/disabling specialized predictors at runtime, the method aims to maintain energy efficiency and avoid violating latency budgets without requiring frequent global redeployment.
  • The approach is evaluated on two datasets with distribution-mismatch scenarios, achieving up to 2.45× lower average inference latency and up to 2.86× lower energy use, while keeping accuracy degradation under 4% versus static baselines.
  • The authors’ contributions include a budgeted specialized-predictor cascade formulation that preserves worst-case latency constraints and an experimental validation on embedded hardware.

Abstract

Industrial systems increasingly depend on Machine Learning (ML), and operate on heterogeneous nodes that must satisfy tight latency, energy, and memory constraints. Dynamic ML models, which reconfigure their computational footprint at runtime, promise high energy efficiency and lower average latency for modest accuracy tradeoffs; however, their deployment is complex due to the additional hyperparameters they rely on. These hyperparameters, controlling the accuracy versus average latency tradeoff, are often tuned on a calibration dataset that must match the test time distribution, an assumption that rarely holds in real-world scenarios, leading to suboptimal operational conditions, possibly below static models. We propose a two-tier adaptive architecture that co-optimizes model and system decisions. At the global level, a scheduler configures and deploys, for each edge node, a cascade of classifiers composed of lightweight specialized models and a generalist fallback, satisfying latency and memory constraints. At the node level, a local controller tracks data drifts and hardware resources, enabling or disabling specialized predictors (SP) to preserve high energy efficiency and avoid latency-constraint violations under varying conditions. This design allows longer operating times without forcing a global redeployment step, and enables efficient execution in case of an unreachable remote global controller. We evaluate the approach on two datasets under controlled distribution mismatch scenarios, showing average per-inference reductions of latency up to 2.45x and energy up to 2.86x, with less than 4% accuracy drop compared to static baselines. Our contributions are:(1) a budgeted SP-cascade formulation that preserves worst-case latency constraints;(2) a hierarchical controller that maintains efficiency under data and resource changes; and (3) an experimental evaluation on embedded hardware.