Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions

arXiv cs.AI / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes two separate agentic AI platforms: one that autonomously trains predictive ML models for protein-protein interactions (PPIs) and another that induces explicit, human-readable rules for those interactions.
  • The predictive platform uses a five-agent workflow covering autonomous data collection, verification, feature embedding, model design, and training/validation using three-way protein-disjoint cross-fold datasets.
  • Reported performance for the three-way protein-disjoint ensemble is 87.3% accuracy for human-human PPIs and 86.5% accuracy for human-virus PPIs.
  • The rule-induction platform generates interpretable rules using protein embeddings and other structured descriptors, with rule complexity differing between human-human (two rules) and human-virus (weighted multi-rule set) PPIs.
  • The induced rules are reported to align with SHAP-identified features from the predictive models, showing the system can go from data planning to execution and from rule induction to explanation.

Abstract

We instruct an AI agent to construct two separate agentic AI platforms: one for autonomous training of predictive ML models for human-human and virus-human PPI, and the other for inducing explicit general rules governing human-human and virus-human PPI. The first agentic AI platform for autonomous training of predictive ML models for PPI is designed to consist of five AI agents that handle autonomous data collection, data verification, feature embedding, model design, and training and validation on three-way protein-disjoint cross-fold datasets. For human-human and human-virus PPIs, the final three-way protein-disjoint ensemble achieves an accuracy of 87.3% and 86.5%, respectively. For cross-checking and interpretability purposes, the second agentic AI platform is designed to replace ML predictions with human-readable rules derived from protein embeddings, physicochemical autocovariance descriptors, compartment annotations, pathway-domain overlap, and graph contexts. For human-human PPI, it is defined by a two-rule induction, whereas human-virus is induced by a more complex set of weighted rules. The rules induced by the second agentic platform align with the SHAP-identified features from the predictive ML models built by the first agentic platform. Taken together, our work demonstrates the agentic AI's ability to orchestrate from data planning to execution, and from rule induction to explanation in ML, opening the door to various applications.