Abstract
Topology control for power grid operation is a challenging sequential decision making problem because the action space grows combinatorially with the size of the grid and action evaluation through simulation is computationally expensive. We propose a physics-informed Reinforcement Learning framework that combines semi-Markov control with a Gibbs prior, that encodes the system's physics, over the action space. The decision is only taken when the grid enters a hazardous regime, while a graph neural network surrogate predicts the post action overload risk of feasible topology actions. These predictions are used to construct a physics-informed Gibbs prior that both selects a small state-dependent candidate set and reweights policy logits before action selection. In this way, our method reduces exploration difficulty and online simulation cost while preserving the flexibility of a learned policy. We evaluate the approach in three realistic benchmark environments of increasing difficulty. Across all settings, the proposed method achieves a strong balance between control quality and computational efficiency: it matches oracle-level performance while being approximately 6\times faster on the first benchmark, reaches 94.6\% of oracle reward with roughly 200\times lower decision time on the second one, and on the most challenging benchmark improves over a PPO baseline by up to 255\% in reward and 284\% in survived steps while remaining about 2.5\times faster than a strong specialized engineering baseline. These results show that our method provides an effective mechanism for topology control in power grids.