AI Navigate

MS2MetGAN: Latent-space adversarial training for metabolite-spectrum matching in MS/MS database search

arXiv cs.LG / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • MS2MetGAN presents a latent-space adversarial training framework that reframes metabolite-spectrum matching as aligning latent vectors learned by autoencoders for both metabolites and MS/MS spectra.
  • A GAN is employed to generate latent vectors of decoy metabolites, enabling the construction of negative samples for training.
  • The approach aims to improve identification accuracy in MS/MS database searches compared with existing metabolite identification methods.
  • Experimental results show that MS2MetGAN achieves better overall performance than prior methods on benchmark datasets.

Abstract

Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite-spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search-based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite-spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite-spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.