Donor-Aware scRNA-seq Benchmarks for IBD Classification

arXiv stat.ML / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that scRNA-seq disease classification for IBD must use donor-aware cross-validation, because random cell-splitting creates pseudoreplication and can overstate performance.
  • It introduces a donor-aware benchmark across two independent IBD cohorts (SCP259 for ulcerative colitis and Kong 2023 for Crohn’s disease) comparing three feature representations: CLR composition, GatedStructuralCFN dependency embeddings, and scVI latent embeddings.
  • Results show strong donor-aware performance, with CLR and CFN reaching high AUROC on SCP259 and CFN outperforming linear CLR in the Kong cohort’s colon region, while terminal ileum performance favors linear models.
  • The study finds that cross-dataset transfer is asymmetric (CD→UC works with AUC 0.833, UC→CD is near chance) and that compartment-stratified features improve CFN edge stability by reducing spurious instability from global composition.
  • It provides code for the benchmark (GitHub link) and concludes that compartment-aware feature construction is key for both predictive accuracy and interpretability of model structure.

Abstract

Donor-level disease classification from single-cell RNA sequencing (scRNA-seq) requires strict donor-aware cross-validation: naive pipelines that split cells randomly conflate training and test donors, inflating reported performance through pseudoreplication. We present a donor-aware benchmark evaluating three feature representations across two independent IBD cohorts: centered log-ratio (CLR) transformed cell-type composition, GatedStructuralCFN dependency embeddings, and scVI variational autoencoder latent embeddings. The cohorts are the SCP259 ulcerative colitis atlas (UC vs. Healthy, n=30 donors, 51 cell types) and the Kong 2023 Crohn's disease atlas (CD vs. Healthy, n=71 donors, 55-68 cell types across three intestinal regions). Compartment-stratified CLR composition achieves AUROC 0.956 +/- 0.061 on SCP259; GatedStructuralCFN on the same features achieves 0.978 +/- 0.050. In the Kong cohort, CFN achieves its best performance in the colon region (0.960 +/- 0.055 after feature filtering), exceeding linear CLR (0.900 +/- 0.100), while terminal ileum classification is dominated by linear models (CatBoost CLR 0.967 +/- 0.075 vs. CFN 0.811 +/- 0.164). Cross-dataset transfer (CD->UC, four shared cell types) achieves AUC 0.833 with XGBoost CLR; the reverse direction performs at chance. CFN edge stability analysis shows that compartment-wise composition eliminates spurious unit-sum-induced instability present in global composition (Jaccard 0.026 vs. top-20 recurrence 1.0). CFN shows a consistent numerical advantage over linear models in the colon region of CD (AUROC 0.960 vs. 0.900), though no inter-method comparison reached statistical significance at n<=34 donors per region. Compartment-aware feature construction is critical for both classification performance and structural interpretability. Code: https://github.com/Jonathan-321/sfn-scrna-study