Protein-Conditioned Multi-Objective Reinforcement Learning for Full-Length mRNA Design

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study introduces ProMORNA, a multi-objective framework that generates full-length therapeutic mRNA transcripts de novo from a target protein sequence while balancing stability, translation efficiency, and immune safety considerations.
ProMORNA is built on a BART-style encoder-decoder model trained with more than 6 million natural protein–mRNA pairs to learn protein-to-transcript generation.
The work proposes Multi-Objective Group Relative Policy Optimization (MO-GRPO) to optimize multiple biological objectives simultaneously within a unified reinforcement learning approach.
In a case study using firefly luciferase as a held-out target (excluded from training and prompts), ProMORNA improves the in silico Pareto frontier for predicted half-life and translation efficiency versus supervised baselines.
The computational results also show higher predicted functional scores than a state-of-the-art baseline under the same evaluation pipeline, suggesting multi-objective RL can generalize to unseen targets for full-length mRNA design.

Abstract

Designing therapeutic messenger RNA (mRNA) requires creating full-length transcripts that carefully balance stability, translation efficiency, and immune safety. To address this challenge, we propose ProMORNA, a multi-objective generation framework that produces complete mRNA transcripts \textit{de novo} directly from a target protein sequence. Our approach begins by training a BART-style encoder-decoder model on over 6 million natural protein-mRNA pairs. We then introduce Multi-Objective Group Relative Policy Optimization (MO-GRPO) to simultaneously optimize for various biological objectives in a unified way. As a case study, we evaluated ProMORNA on the widely used firefly luciferase target, excluding it from both our supervised training data and the prompt pool. The results indicate that ProMORNA improves the \textit{in silico} Pareto frontier for predicted half-life and translation efficiency relative to standard supervised baselines. Additionally, it achieves higher predicted functional scores than a state-of-the-art baseline under the same evaluation pipeline. These computational findings demonstrate the feasibility of using multi-objective reinforcement learning for full-length mRNA design on unseen targets.