A Comparative Study of QSPR Methods on a Unique Multitask PAMPA dataset

arXiv cs.LG / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a new multitask PAMPA dataset covering 143 molecules tested in vitro for passive membrane permeability across six different model membranes.
  • It compares multiple molecular descriptor sets and regression approaches, from simple linear regression to a pre-trained transformer model.
  • The study focuses on how predictive performance must be balanced against model interpretability, especially when using machine learning methods.
  • Results suggest that expert-designed physico-chemical descriptors can be more effective than deep learning representations for studies with limited sample sizes.
  • The authors position the work as the most comprehensive attempt to simultaneously model multiple organ-specific PAMPA membranes so far, providing new membrane-specific permeability insights.

Abstract

We present a unique, multitask dataset comprising 143 drug and drug candidate molecules, each evaluated on in vitro, parallel artificial-membrane permeability assays (PAMPA) using six different model membranes. Using this resource, we systematically assess the effectiveness of various molecular descriptors and regression models in predicting passive membrane permeability. The studied models range from simple linear regression to a modern pre-trained transformer architecture. Particular attention is given to the trade-off between predictive performance and model interpretability, highlighting the challenges introduced by machine learning approaches. To our knowledge, this is the most comprehensive study on simultaneous modeling of multiple organ-specific PAMPA membranes to date, offering novel insights into membrane-specific permeability profiles. We found that expert-designed physico-chemical property descriptors are more fitting for a limited sample size permeabilty study than deep learning based representations.