KANEL: Kolmogorov-Arnold Network Ensemble Learning Enables Early Hit Enrichment in High-Throughput Virtual Screening

arXiv cs.LG / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that early hit enrichment metrics like Positive Predicted Value at top N (PPV@N) are more actionable for virtual screening than global measures such as AUC.
  • It introduces KANEL, an ensemble workflow that integrates interpretable Kolmogorov-Arnold Networks (KANs) with additional predictors (XGBoost, random forest, and multilayer perceptrons).
  • KANEL is trained using complementary molecular representations, including LillyMol descriptors, RDKit-derived descriptors, and Morgan fingerprints, to improve ranking performance.
  • The overall approach targets better prioritization of compounds for experimental follow-up in high-throughput virtual screening pipelines by optimizing for early enrichment.

Abstract

Machine learning models of chemical bioactivity are increasingly used for prioritizing a small number of compounds in virtual screening libraries for experimental follow-up. In these applications, assessing model accuracy by early hit enrichment such as Positive Predicted Value (PPV) calculated for top N hits (PPV@N) is more appropriate and actionable than traditional global metrics such as AUC. We present KANEL, an ensemble workflow that combines interpretable Kolmogorov-Arnold Networks (KANs) with XGBoost, random forest, and multilayer perceptron models trained on complementary molecular representations (LillyMol descriptors, RDKit-derived descriptors, and Morgan fingerprints).