When Can We Trust Deep Neural Networks? Towards Reliable Industrial Deployment with an Interpretability Guide

arXiv cs.CV / 4/22/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a key barrier to using deep neural networks in safety-critical settings: they can produce highly accurate but unreliable outputs with no internal mechanism to flag uncertainty or errors.
  • It proposes a post-hoc, explanation-based reliability indicator for binary defect detection that aims to proactively catch false negatives by comparing class-specific vs class-agnostic discriminative heatmaps.
  • The method computes a reliability score using the difference in Intersection over Union (IoU) between those heatmaps, and adds an adversarial enhancement step to further amplify the signal.
  • Experiments on two industrial defect detection benchmarks show the approach can effectively identify false negatives, reaching 100% recall with adversarial enhancement while trading off performance on true negatives.
  • Overall, the authors argue for a new “data-model-explanation-output” deployment paradigm that goes beyond end-to-end black-box predictions to better support trustworthy real-world AI.

Abstract

The deployment of AI systems in safety-critical domains, such as industrial defect inspection, autonomous driving, and medical diagnosis, is severely hampered by their lack of reliability. A single undetected erroneous prediction can lead to catastrophic outcomes. Unfortunately, there is often no alternative but to place trust in the outputs of a trained AI system, which operates without an internal safeguard to flag unreliable predictions, even in cases of high accuracy. We propose a post-hoc explanation-based indicator to detect false negatives in binary defect detection networks. To our knowledge, this is the first method to proactively identify potentially erroneous network outputs. Our core idea leverages the difference between class-specific discriminative heatmaps and class-agnostic ones. We compute the difference in their intersection over union (IoU) as a reliability score. An adversarial enhancement method is further introduced to amplify this disparity. Evaluations on two industrial defect detection benchmarks show our method effectively identifies false negatives. With adversarial enhancement, it achieves 100\% recall, albeit with a trade-off for true negatives. Our work thus advocates for a new and trustworthy deployment paradigm: data-model-explanation-output, moving beyond conventional end-to-end systems to provide critical support for reliable AI in real-world applications.