AI Navigate

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

arXiv cs.LG / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes BVSIMC, a Bayesian variable-selection-guided inductive matrix completion model that learns sparse latent embeddings while selecting relevant side features for drug discovery.
  • It claims to improve predictive accuracy and interpretability compared with state-of-the-art methods, demonstrated through simulations and two real-world tasks: predicting drug resistance in Mycobacterium tuberculosis and predicting new drug-disease associations for computational repositioning.
  • By enforcing sparsity, BVSIMC addresses high-dimensional, noisy side information and yields clinically meaningful side features that are easier to interpret.
  • The work includes extensive validation on synthetic and real data, suggesting potential to enhance in silico drug discovery workflows and feature-led insights for researchers.

Abstract

Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.