Multi-Objective Reinforcement Learning for Generating Covalent Inhibitor Candidates

arXiv cs.LG / 4/23/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The study presents a machine-learning pipeline that uses multi-objective reinforcement learning to generate covalent inhibitor candidates while balancing properties like binding affinity, selectivity, synthetic accessibility, and electrophilic reactivity.
  • A SMILES-based pretrained LSTM is optimized with policy-gradient RL using Pareto crowding distance to manage competing scoring functions for two targets, EGFR and acetylcholinesterase (ACHE).
  • In 10,000-structure runs, the pipeline rediscovered known covalent inhibitors at rates up to 0.50% for EGFR and 0.74% for ACHE, and produced candidates with short warhead-to-residue distances before docking-based filtering.
  • Notably, the model generated covalent warhead motifs not present in the training data (e.g., allenes and specific sultam/lactone motifs), indicating it can explore covalent chemical space beyond the training distribution.
  • The authors conclude the RL-guided approach could be a practical tool for medicinal chemists to support covalent drug discovery efforts.

Abstract

Rational design of covalent inhibitors requires simultaneously optimizing multiple properties, such as binding affinity, target selectivity, or electrophilic reactivity. This presents a multi-objective problem not easily addressed by screening alone. Here we present a machine learning pipeline for generating covalent inhibitor candidates using multi-objective reinforcement learning (RL), applied to two targets: epidermal growth factor receptor (EGFR) and acetylcholinesterase (ACHE). A SMILES-based pretrained LSTM serves as the generative model, optimized via policy gradient RL with Pareto crowding distance to balance competing scoring functions including synthetic accessibility, predicted covalent activity, residue affinity, and an approximated docking score. The pipeline rediscovers known covalent inhibitors at rates of up to 0.50% (EGFR) and 0.74% (ACHE) in 10,000-structure runs, with candidate structures achieving warhead-to-residue distances as short as 5.5 angstrom (EGFR) and 3.2 angstrom (ACHE) after further docking-based screening. More notably, the pipeline spontaneously generates structures bearing warhead motifs absent from the training data - including allenes, 3-oxo-\beta-sultams, and \alpha-methylene-\beta-lactones - all of which have independent literature support as covalent warheads. These results suggest that RL-guided generation can explore covalent chemical space beyond its training distribution, and may be useful as a tool for medicinal chemists working on covalent drug discovery.