DMMRL: Disentangled Multi-Modal Representation Learning via Variational Autoencoders for Molecular Property Prediction

arXiv cs.LG / 2026/3/24

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper introduces DMMRL, a variational autoencoder-based method for disentangling molecular representations into shared (structure-relevant) and private (modality-specific) latent spaces to address entangled structure-property factors.
  • It improves cross-modal learning by using orthogonality and alignment regularizations to encourage statistical independence and consistency across graphs, sequences, and geometries rather than naive concatenation.
  • A gated attention fusion module adaptively combines shared representations, aiming to capture richer inter-modal dependencies for molecular property prediction.
  • Experiments on seven benchmark datasets show DMMRL outperforming existing state-of-the-art approaches.
  • The authors release code and data publicly via GitHub, enabling replication and further research.

Abstract

Molecular property prediction constitutes a cornerstone of drug discovery and materials science, necessitating models capable of disentangling complex structure-property relationships across diverse molecular modalities. Existing approaches frequently exhibit entangled representations--conflating structural, chemical, and functional factors--thereby limiting interpretability and transferability. Furthermore, conventional methods inadequately exploit complementary information from graphs, sequences, and geometries, often relying on naive concatenation that neglects inter-modal dependencies. In this work, we propose DMMRL, which employs variational autoencoders to disentangle molecular representations into shared (structure-relevant) and private (modality-specific) latent spaces, enhancing both interpretability and predictive performance. The proposed variational disentanglement mechanism effectively isolates the most informative features for property prediction, while orthogonality and alignment regularizations promote statistical independence and cross-modal consistency. Additionally, a gated attention fusion module adaptively integrates shared representations, capturing complex inter-modal relationships. Experimental validation across seven benchmark datasets demonstrates DMMRL's superior performance relative to state-of-the-art approaches. The code and data underlying this article are freely available at https://github.com/xulong0826/DMMRL.