Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Smart Commander, a hierarchical reinforcement learning (HRL) framework aimed at optimizing military aviation Prognostics and Health Management (PHM) decisions across large aircraft fleets despite sparse, delayed feedback and stochastic mission profiles.
  • It decomposes the control task into two tiers: a fleet-level strategic “General Commander” that optimizes availability and cost, and multiple tactical “Operation Commanders” that handle sortie generation, maintenance scheduling, and logistics resource allocation.
  • The approach combines layered reward shaping with planning-enhanced neural networks to better cope with the curse of dimensionality and sparse/delayed rewards that challenge conventional monolithic deep reinforcement learning.
  • Evaluation in a custom high-fidelity discrete-event simulation shows Smart Commander outperforms both monolithic DRL and rule-based baselines, with reported improvements in training efficiency, scalability, and robustness in failure-prone scenarios.
  • Overall, the results suggest HRL could be a practical and reliable paradigm for next-generation intelligent fleet management under realistic operational constraints.

Abstract

Decision-making in military aviation Prognostics and Health Management (PHM) faces significant challenges due to the "curse of dimensionality" in large-scale fleet operations, combined with sparse feedback and stochastic mission profiles. To address these issues, this paper proposes Smart Commander, a novel Hierarchical Reinforcement Learning (HRL) framework designed to optimize sequential maintenance and logistics decisions. The framework decomposes the complex control problem into a two-tier hierarchy: a strategic General Commander manages fleet-level availability and cost objectives, while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation. The proposed approach is validated within a custom-built, high-fidelity discrete-event simulation environment that captures the dynamics of aircraft configuration and support logistics.By integrating layered reward shaping with planning-enhanced neural networks, the method effectively addresses the difficulty of sparse and delayed rewards. Empirical evaluations demonstrate that Smart Commander significantly outperforms conventional monolithic Deep Reinforcement Learning (DRL) and rule-based baselines. Notably, it achieves a substantial reduction in training time while demonstrating superior scalability and robustness in failure-prone environments. These results highlight the potential of HRL as a reliable paradigm for next-generation intelligent fleet management.