Computational framework for multistep metabolic pathway design

arXiv cs.LG / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a computational framework for multistep de novo metabolic pathway design by combining deep learning with a traditional retrobiosynthesis workflow.
  • It builds a training dataset from public metabolic reaction and enzymatic template databases, augmented by generating artificial reactions from enzymatic reaction templates.
  • Two neural-network-based ranking models are trained as binary classifiers to score the plausibility of candidate 1-step and 2-step pathways.
  • The models are integrated into a multistep retrobiosynthesis pipeline using enzymatic templates, with validation demonstrated by reproducing selected natural and non-natural metabolic pathways computationally.

Abstract

In silico tools are important for generating novel hypotheses and exploring alternatives in de novo metabolic pathway design. However, while many computational frameworks have been proposed for retrobiosynthesis, few successful examples of algorithm-guided xenobiotic biochemical retrosynthesis have been reported in the literature. Deep learning has improved the quality of synthesis and retrosynthesis in organic chemistry applications. Inspired by this progress, we explored combining deep learning of biochemical transformations with the traditional retrobiosynthetic workflow to improve in silico synthetic metabolic pathway designs. To develop our computational biosynthetic pathway design framework, we assembled metabolic reaction and enzymatic template data from public databases. A data augmentation procedure, adapted from literature, was carried out to enrich the assembled reaction dataset with artificial metabolic reactions generated by enzymatic reaction templates. Two neural network-based pathway ranking models were trained as binary classifiers to distinguish assembled reactions from artificial counterparts; each model output a scalar quantifying the plausibility of a 1-step or 2-step pathway. Combining these two models with enzymatic templates, we built a multistep retrobiosynthesis pipeline and validated it by reproducing some natural and non-natural pathways computationally.