RiboSphere: Learning Unified and Efficient Representations of RNA Structures

arXiv cs.LG / 3/23/2026

📰 NewsModels & Research

Key Points

  • RiboSphere introduces discrete geometric representations for RNA structures by combining vector quantization with flow matching to capture motif-level structure.
  • The approach uses a geometric transformer encoder to produce SE(3)-invariant features, which are discretized into a finite vocabulary of latent codes via finite scalar quantization (FSQ).
  • A flow-matching decoder reconstructs atomic coordinates conditioned on these codes, achieving high reconstruction fidelity (RMSD 1.25 Å, TM-score 0.84).
  • Learned discrete codes are enriched for specific RNA motifs and transfer to downstream tasks such as inverse folding and RNA-ligand binding predictions, with strong generalization in data-scarce regimes.

Abstract

Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,\AA, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.