Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof

arXiv cs.CV / 4/30/2026

📰 NewsModels & Research

Key Points

  • The paper argues that reliable COVID-19 chest X-ray classification with AI requires lung segmentation, since prior work often skipped it and may have produced less trustworthy predictions.
  • Using class activation mapping on CNNs, the study visually assesses what the model attends to and finds evidence supporting the need for lung-region segmentation.
  • It compares models trained with and without data augmentation, finding that beyond a certain augmentation threshold, test accuracy declines due to overfitting.
  • The authors propose a methodology called SDL-COVID and report strong performance for COVID-19 detection, including 95.21% precision and a reduced false negative rate.
  • Overall, the study provides practical guidance on when segmentation and augmentation are beneficial and when they can harm generalization in medical imaging AI pipelines.

Abstract

Purpose: Rapid and reliable diagnostic tools are crucial for managing respiratory diseases like COVID-19, where chest X-ray analysis coupled with artificial intelligence techniques has proven invaluable. However, most existing works on X-ray images have not considered lung segmentation, raising concerns about their reliability. Additionally, some have employed disproportionate and impractical augmentation techniques, making models less generalized and prone to overfitting. This study presents a critical analysis of both issues and proposes a methodology (SDL-COVID) for more reliable classification of chest X-rays for COVID-19 detection. Methods: We use class activation mapping to obtain a visual understanding of the predictions made by Convolutional Neural Networks (CNNs), validating the necessity of lung segmentation. To analyze the effect of data augmentation, deep learning models are implemented on two levels: one for an augmented dataset and another for a non-augmented dataset. Results: Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Regarding data augmentation, test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. Conclusion: Our proposed methodology, SDL-COVID, achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.