Perturb-and-Restore: Simulation-driven Structural Augmentation Framework for Imbalance Chromosomal Anomaly Detection

arXiv cs.CV / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “Perturb-and-Restore (P&R),” a simulation-driven framework to mitigate severe class imbalance and scarcity in structural chromosomal anomaly detection datasets.
  • P&R generates synthetic abnormal chromosomes by perturbing banding patterns of normal chromosomes and then uses a restoration diffusion network to reconstruct continuous chromosome content and edges, reducing dependence on rare real abnormal samples.
  • It further improves training data quality with “energy-guided adaptive sampling,” an online strategy that prioritizes high-quality synthetic samples based on energy scores derived from real-sample energy distributions.
  • The authors build a large structural anomaly dataset with 260,000+ chromosome images, including 4,242 abnormal samples across 24 categories, and report state-of-the-art results with average gains of 8.92% sensitivity, 8.89% precision, and 13.79% F1 across categories.

Abstract

Detecting structural chromosomal abnormalities is crucial for accurate diagnosis and management of genetic disorders. However, collecting sufficient structural abnormality data is extremely challenging and costly in clinical practice, and not all abnormal types can be readily collected. As a result, deep learning approaches face significant performance degradation due to the severe imbalance and scarcity of abnormal chromosome data. To address this challenge, we propose a Perturb-and-Restore (P&R), a simulation-driven structural augmentation framework that effectively alleviates data imbalance in chromosome anomaly detection. The P&R framework comprises two key components: (1) Structure Perturbation and Restoration Simulation, which generates synthetic abnormal chromosomes by perturbing chromosomal banding patterns of normal chromosomes followed by a restoration diffusion network that reconstructs continuous chromosome content and edges, thus eliminating reliance on rare abnormal samples; and (2) Energy-guided Adaptive Sampling, an energy score-based online selection strategy that dynamically prioritizes high-quality synthetic samples by referencing the energy distribution of real samples. To evaluate our method, we construct a comprehensive structural anomaly dataset consisting of over 260,000 chromosome images, including 4,242 abnormal samples spanning 24 categories. Experimental results demonstrate that the P&R framework achieves state-of-the-art (SOTA) performance, surpassing existing methods with an average improvement of 8.92% in sensitivity, 8.89% in precision, and 13.79% in F1-score across all categories.