AI Navigate

Synthetic Melanoma Image Generation and Evaluation Using Generative Adversarial Networks

arXiv cs.CV / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper systematically benchmarks four GAN architectures (DCGAN, StyleGAN2, and two StyleGAN3 variants) for high-resolution melanoma image synthesis on ISIC 2018 and ISIC 2020, with unified preprocessing and hyperparameter exploration emphasizing R1 regularization.
  • The evaluation uses a multi-faceted protocol including FID, FMD, qualitative dermoscopic inspection, a frozen EfficientNet melanoma detector, and independent dermatologists, to assess both statistical quality and diagnostic relevance.
  • StyleGAN2 achieves the best balance of quantitative performance and perceptual quality, with FID scores of 24.8 (ISIC 2018) and 7.96 (ISIC 2020) at gamma=0.8, and the frozen classifier recognizing 83% of its synthetic images as melanoma.
  • Dermatologists distinguish synthetic from real images with 66.5% accuracy (chance 50%), and exhibit low inter-rater agreement (kappa = 0.17), indicating substantial realism of the generated images.
  • Augmenting real datasets with StyleGAN2-generated melanoma images improves melanoma detection AUC from 0.925 to 0.945 on a held-out real-test set, demonstrating practical benefit for addressing class imbalance in melanoma ML pipelines.

Abstract

Melanoma is the most lethal form of skin cancer, and early detection is critical for improving patient outcomes. Although dermoscopy combined with deep learning has advanced automated skin-lesion analysis, progress is hindered by limited access to large, well-annotated datasets and by severe class imbalance, where melanoma images are substantially underrepresented. To address these challenges, we present the first systematic benchmarking study comparing four GAN architectures-DCGAN, StyleGAN2, and two StyleGAN3 variants (T/R)-for high-resolution melanoma-specific synthesis. We train and optimize all models on two expert-annotated benchmarks (ISIC 2018 and ISIC 2020) under unified preprocessing and hyperparameter exploration, with particular attention to R1 regularization tuning. Image quality is assessed through a multi-faceted protocol combining distribution-level metrics (FID), sample-level representativeness (FMD), qualitative dermoscopic inspection, downstream classification with a frozen EfficientNet-based melanoma detector, and independent evaluation by two board-certified dermatologists. StyleGAN2 achieves the best balance of quantitative performance and perceptual quality, attaining FID scores of 24.8 (ISIC 2018) and 7.96 (ISIC 2020) at gamma=0.8. The frozen classifier recognizes 83% of StyleGAN2-generated images as melanoma, while dermatologists distinguish synthetic from real images at only 66.5% accuracy (chance = 50%), with low inter-rater agreement (kappa = 0.17). In a controlled augmentation experiment, adding synthetic melanoma images to address class imbalance improved melanoma detection AUC from 0.925 to 0.945 on a held-out real-image test set. These findings demonstrate that StyleGAN2-generated melanoma images preserve diagnostically relevant features and can provide a measurable benefit for mitigating class imbalance in melanoma-focused machine learning pipelines.