Unifying Runtime Monitoring Approaches for Safety-Critical Machine Learning: Application to Vision-Based Landing

arXiv cs.LG / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses safety-critical ML runtime monitoring by noting that existing methods are fragmented across research communities.
  • It introduces a unified framework that categorizes runtime monitoring into three types: ODD monitoring, OOD monitoring, and OMS monitoring.
  • ODD monitoring focuses on verifying compliance with expected operating conditions, while OOD monitoring rejects inputs that differ from the training distribution.
  • OMS monitoring detects abnormal model behavior using internal states or outputs, complementing the other two monitor types.
  • The authors validate the framework with an experiment on vision-based runway detection for landing, using common safety-oriented metrics to compare monitors.
  • It aims to help practitioners design complementary monitoring activities and to evaluate different monitors consistently.
  • The proposed categorization supports clearer comparisons by standardizing evaluation approaches for safety-critical ML monitoring.

Abstract

Runtime monitoring is essential to ensure the safety of ML applications in safety-critical domains. However, current research is fragmented, with independent methods emerging from different communities. In this paper, we propose a unified framework categorising runtime monitoring approaches into three distinct types: Operational Design Domain (ODD) monitoring, which ensures compliance with expected operating conditions; Out-of-Distribution (OOD) monitoring, which rejects inputs that deviate from the training data; and Out-of-Model-Scope (OMS) monitoring, which detects anomalous model behaviour based its internal states or outputs. We demonstrate the benefits of this categorization with a dedicated experiment on an aeronautical safety-critical application: runway detection during landing. This framework facilitates design of monitoring activities, with complementary categories of monitors, and enables evaluation and comparison of different monitors using common, safety-oriented metrics.