Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids

arXiv cs.LG / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a physics-informed reinforcement learning approach for power-grid topology control, addressing the combinatorial growth of the action space and the high cost of simulating action outcomes.
  • It combines semi-Markov control with a Gibbs prior that encodes physical system constraints over the action space, so decisions are taken primarily when the grid enters hazardous regimes.
  • A graph neural network surrogate predicts post-action overload risk, and those predictions are used to construct a state-dependent candidate action set and reweight policy logits for more efficient action selection.
  • Experiments on three increasingly difficult realistic benchmarks show strong trade-offs between control quality and computational efficiency, including near-oracle performance on simpler tasks and substantial gains over PPO and specialized baselines on harder settings.
  • Overall, the results suggest the method preserves the flexibility of learned policies while significantly reducing exploration difficulty, online simulation cost, and decision latency for grid topology control.

Abstract

Topology control for power grid operation is a challenging sequential decision making problem because the action space grows combinatorially with the size of the grid and action evaluation through simulation is computationally expensive. We propose a physics-informed Reinforcement Learning framework that combines semi-Markov control with a Gibbs prior, that encodes the system's physics, over the action space. The decision is only taken when the grid enters a hazardous regime, while a graph neural network surrogate predicts the post action overload risk of feasible topology actions. These predictions are used to construct a physics-informed Gibbs prior that both selects a small state-dependent candidate set and reweights policy logits before action selection. In this way, our method reduces exploration difficulty and online simulation cost while preserving the flexibility of a learned policy. We evaluate the approach in three realistic benchmark environments of increasing difficulty. Across all settings, the proposed method achieves a strong balance between control quality and computational efficiency: it matches oracle-level performance while being approximately 6\times faster on the first benchmark, reaches 94.6\% of oracle reward with roughly 200\times lower decision time on the second one, and on the most challenging benchmark improves over a PPO baseline by up to 255\% in reward and 284\% in survived steps while remaining about 2.5\times faster than a strong specialized engineering baseline. These results show that our method provides an effective mechanism for topology control in power grids.