AI Navigate

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • EvoFlows is a variable-length sequence-to-sequence protein modeling approach designed for protein engineering, enabling a limited, controllable number of insertions, deletions, and substitutions on a template protein sequence.
  • Unlike autoregressive and masked language models, EvoFlows predict both which mutation to perform and where it should occur by learning mutational trajectories between evolutionarily related sequences using edit flows.
  • In silico evaluations on diverse protein communities from UNIREF and OAS show EvoFlows capture protein sequence distributions with quality comparable to leading masked language models while better generating non-trivial yet natural-like mutants from a given template.
  • The work points to a more controllable, trajectory-aware approach to protein design that could influence future workflows and tooling in protein engineering.

Abstract

We introduce EvoFlows, a variable-length sequence-to-sequence protein modeling approach uniquely suited to protein engineering. Unlike autoregressive and masked language models, EvoFlows perform a limited, controllable number of insertions, deletions, and substitutions on a template protein sequence. In other words, EvoFlows predict not only _which_ mutation to perform, but also _where_ it should occur. Our approach leverages edit flows to learn mutational trajectories between evolutionarily-related protein sequences, simultaneously modeling distributions of related natural proteins and the mutational paths connecting them. Through extensive _in silico_ evaluation on diverse protein communities from UNIREF and OAS, we demonstrate that EvoFlows capture protein sequence distributions with a quality comparable to leading masked language models commonly used in protein engineering, while showing improved ability to generate non-trivial yet natural-like mutants from a given template protein.