DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
arXiv cs.LG / 2026/4/3
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- DISCO-TAB is a new hierarchical reinforcement learning framework that fine-tunes an LLM and uses a multi-objective discriminator system to generate privacy-preserving synthetic clinical tabular data from EHRs.
- The method evaluates generation at multiple feedback granularities (token, sentence, feature, and row) to better capture complex non-linear dependencies and address severe class imbalance that can otherwise produce clinically invalid but statistically plausible records.
- It incorporates Automated Constraint Discovery and Inverse-Frequency Reward Shaping to preserve latent medical logic and mitigate minority-class collapse during synthesis.
- Experiments on small-sample, high-dimensional medical datasets (e.g., Heart Failure and Parkinson’s) show up to a 38.2% improvement in downstream clinical classifier utility over GAN and diffusion baselines, with strong statistical fidelity (JSD < 0.01) and resistance to membership inference attacks.
- The authors position DISCO-TAB as a step toward a new standard for trustworthy synthetic healthcare data generation that maintains both utility and privacy guarantees.




