A multi-stage soft computing framework for complex disease modelling and decision support: A liver cirrhosis case study

arXiv cs.LG / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an ML-driven, multi-stage decision framework to model complex diseases like liver cirrhosis using biomedical data despite high dimensionality, noise, and limited labeled samples.
  • It combines single-cell transcriptomic profiling with hdWGCNA-based high-dimensional gene module stabilisation, then builds deep non-linear representations by converting molecular features into 2D disease maps processed with a CNN.
  • For therapeutic decision support, the framework adds molecular docking to evaluate candidate compounds, linking modelling outputs to drug exploration.
  • In the liver cirrhosis case study, the approach identifies an endothelial subpopulation associated with the disease and extracts seven robust signature genes.
  • The authors report that the CNN-based representation learning improves classification performance over conventional ML pipelines and argue the framework is disease-agnostic for other omics applications.

Abstract

Liver cirrhosis is a major global health problem causing millions of deaths annually, and timely detection with aggressive treatment can significantly improve patients' quality of life. Modelling complex diseases from biomedical data is computationally challenging due to high dimensionality, strong feature correlations, noise, and limited labelled samples. Conventional Machine Learning (ML) pipelines often struggle with robustness, interpretability, and generalisation under such conditions. In this study, we propose an ML-driven multi-stage decision framework for complex disease modelling and therapeutic exploration. The framework integrates single-cell transcriptomic profiling, high-dimensional network-based feature stabilisation, multi-model learning, deep representation construction, and post-hoc decision support. Specifically, single-cell sequencing data were analysed to identify key cellular subpopulations, followed by high-dimensional weighted gene co-expression network analysis (hdWGCNA) to stabilise gene modules under sparsity and noise. To enhance non-linear feature interaction modelling, tabular molecular features were restructured into two-dimensional disease maps and analysed using a CNN. Finally, molecular docking was incorporated as a decision-support module to evaluate candidate therapeutic compounds. Using liver cirrhosis as a representative case, the framework identified a disease-associated endothelial subpopulation and extracted seven robust signature genes (HSPB1, GADD45A, CLDN5, ATP1B3, C1QBP, ENPP2, and PARL). The CNN-based representation learning module outperformed conventional pipelines in classification. The framework is disease-agnostic and readily extends to other omics-driven biomedical applications involving uncertainty, heterogeneity, and limited samples.