Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models

arXiv cs.CV / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • マニュアルで作成される心エコー(echocardiography)のGTラベルはランダムな誤りや系統的バイアスを含み得るため、深層学習セグメンテーションの頑健性を評価しつつGT誤りを扱う手法が検討されています。
  • CAMUSデータセット上で3種類のGTエラーをシミュレートし、lossベースの検出法とVariance of Gradients(VOG)ベースの検出法を比較した結果、VOGが学習中に誤ったGTラベルを高い精度でフラグ付けできたと報告されています。
  • さらに、疑わしい誤りラベルを疑似ラベリングで“refurbish(修復)”するアプローチを提案し、エラーが高い条件ほど性能改善が大きくなることが示されています。
  • 一方で、標準的なU-Netはランダムラベル誤りや中程度の系統的誤り(最大50%)に対しても比較的強い性能を維持できるため、検出・修復は特に高エラー環境で有効と結論づけています。

Abstract

Deep learning-based medical image segmentation typically relies on ground truth (GT) labels obtained through manual annotation, but these can be prone to random errors or systematic biases. This study examines the robustness of deep learning models to such errors in echocardiography (echo) segmentation and evaluates a novel strategy for detecting and refurbishing erroneous labels during model training. Using the CAMUS dataset, we simulate three error types, then compare a loss-based GT label error detection method with one based on Variance of Gradients (VOG). We also propose a pseudo-labelling approach to refurbish suspected erroneous GT labels. We assess the performance of our proposed approach under varying error levels. Results show that VOG proved highly effective in flagging erroneous GT labels during training. However, a standard U-Net maintained strong performance under random label errors and moderate levels of systematic errors (up to 50%). The detection and refurbishment approach improved performance, particularly under high-error conditions.