Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

arXiv cs.LG / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • Proteo-R1 is presented as a new reasoning-guided protein design framework that separates molecular understanding from geometric generation.
  • It uses a dual-expert architecture where a multimodal LLM analyzes sequences/structures/text to identify key functional residues, especially those governing binding and specificity.
  • The identified residue-level decisions are enforced as hard constraints for a separate diffusion-based generation expert, enabling conditional co-design around fixed interaction anchors.
  • The approach aims to improve interpretability, controllability, and modular reuse of biochemical knowledge compared with prior models that largely rely on end-to-end continuous sampling.
  • The authors provide code, data, and demos via the project website.

Abstract

Deep learning in \emph{de novo} protein design has achieved atomic-level fidelity. However, existing models remain largely non-deliberative: they directly synthesize molecular geometries without explicitly reasoning about which residues or interactions are functionally essential. As a result, design decisions are entangled with continuous sampling dynamics, limiting interpretability, controllability, and systematic reuse of biochemical knowledge. We introduce \textbf{Proteo-R1}, a reasoning-guided protein design framework that explicitly decouples \emph{molecular understanding} from \emph{geometric generation}. Proteo-R1 adopts a dual-expert architecture in which a multimodal large language model (MLLM) serves as an \emph{understanding expert}, analyzing protein sequences, structures, and textual context to identify key functional residues that govern binding and specificity. These residue-level decisions are then passed as hard constraints to a separate diffusion-based \emph{generation expert}, which performs conditional co-design while respecting the fixed interaction anchors. This factorization mirrors how human experts approach molecular engineering: first, reasoning about critical interactions, then optimizing geometry subject to those constraints. By operationalizing reasoning as explicit residue-level commitments rather than latent textual guidance, Proteo-R1 achieves stable, interpretable, and modular integration of LLM reasoning with state-of-the-art geometric generative models. Code, data, and demos are available at https://smiles724.github.io/r1/.

Proteo-R1: Reasoning Foundation Models for De Novo Protein Design | AI Navigate