An Integrative Genome-Scale Metabolic Modeling and Machine Learning Framework for Predicting and Optimizing Biofuel-Relevant Biomass Production in Saccharomyces cerevisiae

arXiv cs.LG / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study introduces a computational framework that integrates the Yeast9 genome-scale metabolic model with machine learning and optimization to predict and improve biomass flux in Saccharomyces cerevisiae under varying glucose, oxygen, and ammonium conditions.
  • Using 2,000 simulated flux profiles, Random Forest and XGBoost regressors achieved very high predictive performance (R2 ≈ 0.9999 and 0.9990), enabling accurate flux prediction.
  • A variational autoencoder identified four metabolic clusters, while SHAP-based interpretability highlighted glycolysis, the TCA cycle, and lipid biosynthesis as key determinants of biomass production.
  • The authors report practical in silico optimization results: in silico overexpression improved biomass flux to 0.979 gDW/hr, and Bayesian optimization of nutrient constraints increased biomass flux up to about 12x (0.0858 to 1.041 gDW/hr).
  • A generative adversarial network (GAN) further proposes new, stoichiometrically feasible flux configurations, combining simulation validity with generative search for novel metabolic states relevant to biofuel-relevant biomass engineering.

Abstract

Saccharomyces cerevisiae is a cornerstone organism in industrial biotechnology, valued for its genetic tractability and robust fermentative capacity. Accurately predicting biomass flux across diverse environmental and genetic perturbations remains a significant challenge for rational strain design. We present a computational framework combining the Yeast9 genome-scale metabolic model with machine learning and optimization to predict, interpret, and enhance biomass flux. Flux balance analysis generated 2,000 flux profiles by varying glucose, oxygen, and ammonium uptake rates. Random Forest and XGBoost regressors achieved R2 of 0.99989 and 0.9990, respectively. A variational autoencoder revealed four distinct metabolic clusters, and SHAP analysis identified glycolysis, the TCA cycle, and lipid biosynthesis as key biomass determinants. In silico overexpression achieved a biomass flux of 0.979 gDW/hr, while Bayesian optimization of nutrient constraints produced a 12-fold increase (0.0858 to 1.041 gDW/hr). A generative adversarial network proposed stoichiometrically feasible novel flux configurations. This framework demonstrates how genome-scale simulation, interpretable ML, and generative modeling can advance yeast metabolic engineering.
広告