Interpretable Deep Reinforcement Learning for Element-level Bridge Life-cycle Optimization

arXiv cs.AI / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study addresses a shift in the National Bridge Inventory (SNBI) toward element-level condition states, which increases bridge condition data granularity but also expands the RL state space substantially.
  • It proposes an interpretable deep reinforcement learning framework that produces life-cycle decision policies as oblique decision trees, making the results human-auditable and easier to integrate into existing bridge management systems.
  • To achieve near-optimal performance while keeping policies interpretable, the method uses differentiable soft tree actor models, temperature annealing during training, and regularization with pruning to control tree complexity.
  • The approach is demonstrated on a steel girder bridge life-cycle optimization problem and evaluated across supervised and reinforcement learning settings, highlighting benefits and trade-offs of the proposed techniques.

Abstract

The new Specifications for the National Bridge Inventory (SNBI), in effect from 2022, emphasize the use of element-level condition states (CS) for risk-based bridge management. Instead of a general component rating, element-level condition data use an array of relative CS quantities (i.e., CS proportions) to represent the condition of a bridge. Although this greatly increases the granularity of bridge condition data, it introduces challenges to set up optimal life-cycle policies due to the expanded state space from one single categorical integer to four-dimensional probability arrays. This study proposes a new interpretable reinforcement learning (RL) approach to seek optimal life-cycle policies based on element-level state representations. Compared to existing RL methods, the proposed algorithm yields life-cycle policies in the form of oblique decision trees with reasonable amounts of nodes and depth, making them directly understandable and auditable by humans and easily implementable into current bridge management systems. To achieve near-optimal policies, the proposed approach introduces three major improvements to existing RL methods: (a) the use of differentiable soft tree models as actor function approximators, (b) a temperature annealing process during training, and (c) regularization paired with pruning rules to limit policy complexity. Collectively, these improvements can yield interpretable life-cycle policies in the form of deterministic oblique decision trees. The benefits and trade-offs from these techniques are demonstrated in both supervised and reinforcement learning settings. The resulting framework is illustrated in a life-cycle optimization problem for steel girder bridges.