Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning

arXiv cs.LG / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a graph-based hierarchical reinforcement learning framework to automate the co-design of thermodynamic cycles by representing cycle structures as graphs with grammar-constrained nodes and edges.
  • It uses a deep learning thermophysical surrogate to enable stable decoding from graphs and to jointly resolve global parameters during optimization.
  • A manager-worker RL setup drives the search: the high-level manager explores structural evolution and proposes candidate configurations, while the low-level worker optimizes parameters and returns performance-based rewards.
  • In heat pump and heat engine case studies, the method reproduces classical configurations and discovers 18 novel heat pump cycles and 21 novel heat engine cycles.
  • The reported novel designs show performance gains of 4.6% (heat pumps) and 133.3% (heat engines) versus classical baselines, suggesting improved efficiency and scalability over expert-driven design.

Abstract

Thermodynamic cycles are pivotal in determining the efficacy of energy conversion systems. Traditional design methodologies, which rely on expert knowledge or exhaustive enumeration, are inefficient and lack scalability, thereby constraining the discovery of high-performance cycles. In this study, we introduce a graph-based hierarchical reinforcement learning approach for the co-design of structure parameters in thermodynamic cycles. These cycles are encoded as graphs, with components and connections depicted as nodes and edges, adhering to grammatical constraints. A deep learning-based thermophysical surrogate facilitates stable graph decoding and the simultaneous resolution of global parameters. Building on this foundation, we develop a hierarchical reinforcement learning framework wherein a high-level manager explores structural evolution and proposes candidate configurations, whereas a low-level worker optimizes parameters and provides performance rewards to steer the search towards high-performance regions. By integrating graph representation, thermophysical surrogate, and manager-worker learning, this method establishes a fully automated pipeline for encoding, decoding, and co-optimization. Using heat pump and heat engine cycles as case studies, the results demonstrate that the proposed method not only replicates classical cycle configurations but also identifies 18 and 21 novel heat pump and heat engine cycles, respectively. Relative to classical cycles, the novel configurations exhibit performance improvements of 4.6% and 133.3%, respectively, surpassing the traditional designs. This method effectively balances efficiency with broad applicability, providing a practical and scalable intelligent alternative to expert-driven thermodynamic cycle design.