Reference-state System Reliability method for scalable uncertainty quantification of coherent systems

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces the Reference-state System Reliability (RSR) method to enable scalable uncertainty quantification for coherent systems such as infrastructure networks and supply chains.
  • Unlike decomposition-based techniques that become inefficient as components grow, RSR uses reference states to classify Monte Carlo samples, making computational cost far less sensitive to the number of reference states.
  • RSR improves runtime by storing samples and reference states as matrices and using batched matrix operations, leveraging high-throughput matrix computing advances from modern machine learning.
  • The authors report evaluating a graph with 119 nodes and 295 edges in under 10 seconds and demonstrate scaling to hundreds of thousands of reference states, plus support for multi-state systems.
  • The method’s convergence slows when the number of boundary reference states becomes extremely large, motivating future research into learning-based representations of system-state boundaries.

Abstract

Coherent systems are representative of many practical applications, ranging from infrastructure networks to supply chains. Probabilistic evaluation of such systems remains challenging, however, because existing decomposition-based methods scale poorly as the number of components grows. To address this limitation, this study proposes the Reference-state System Reliability (RSR) method. Like existing approaches, RSR characterises the boundary between different system states using reference states in the component-state space. Where it departs from these methods is in how the state space is explored: rather than using reference states to decompose the space into disjoint hypercubes, RSR uses them to classify Monte Carlo samples, making computational cost significantly less sensitive to the number of reference states. To make this classification efficient, samples and reference states are stored as matrices and compared using batched matrix operations, allowing RSR to exploit the advances in high-throughput matrix computing driven by modern machine learning. We demonstrate that RSR evaluates the system-state probability of a graph with 119 nodes and 295 edges within 10~seconds, highlighting its potential for real-time risk assessment of large-scale systems. We further show that RSR scales to problems involving hundreds of thousands of reference states -- well beyond the reach of existing methods -- and extends naturally to multi-state systems. Nevertheless, when the number of boundary reference states grows exceedingly large, RSR's convergence slows down, a limitation shared with existing reference-state-based approaches that motivates future research into learning-based representations of system-state boundaries.