Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models

arXiv cs.CV / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

マニュアルで作成される心エコー（echocardiography）のGTラベルはランダムな誤りや系統的バイアスを含み得るため、深層学習セグメンテーションの頑健性を評価しつつGT誤りを扱う手法が検討されています。
CAMUSデータセット上で3種類のGTエラーをシミュレートし、lossベースの検出法とVariance of Gradients（VOG）ベースの検出法を比較した結果、VOGが学習中に誤ったGTラベルを高い精度でフラグ付けできたと報告されています。
さらに、疑わしい誤りラベルを疑似ラベリングで“refurbish（修復）”するアプローチを提案し、エラーが高い条件ほど性能改善が大きくなることが示されています。
一方で、標準的なU-Netはランダムラベル誤りや中程度の系統的誤り（最大50%）に対しても比較的強い性能を維持できるため、検出・修復は特に高エラー環境で有効と結論づけています。

Abstract

Deep learning-based medical image segmentation typically relies on ground truth (GT) labels obtained through manual annotation, but these can be prone to random errors or systematic biases. This study examines the robustness of deep learning models to such errors in echocardiography (echo) segmentation and evaluates a novel strategy for detecting and refurbishing erroneous labels during model training. Using the CAMUS dataset, we simulate three error types, then compare a loss-based GT label error detection method with one based on Variance of Gradients (VOG). We also propose a pseudo-labelling approach to refurbish suspected erroneous GT labels. We assess the performance of our proposed approach under varying error levels. Results show that VOG proved highly effective in flagging erroneous GT labels during training. However, a standard U-Net maintained strong performance under random label errors and moderate levels of systematic errors (up to 50%). The detection and refurbishment approach improved performance, particularly under high-error conditions.

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Failure to Reproduce Modern Paper Claims [D]

Reddit r/MachineLearning

Why don’t they just use Mythos to fix all the bugs in Claude Code?

Reddit r/LocalLLaMA

Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models

Key Points

Abstract

Related Articles

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Failure to Reproduce Modern Paper Claims [D]

Why don’t they just use Mythos to fix all the bugs in Claude Code?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer