Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

arXiv cs.RO / 3/25/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper introduces E3Flow, an SO(3)-equivariant hybrid visuomotor policy framework designed to overcome prior equivariant diffusion-policy limitations around compute cost, single-modality dependence, and instability with fast-sampling methods.
  • E3Flow combines efficient rectified flow with stable, multi-modal equivariant learning using spherical harmonic representations to enforce rigorous rotational equivariance.
  • It proposes an invariant Feature Enhancement Module (FEM) that dynamically fuses hybrid visual inputs (point clouds and images) and injects additional visual cues into spherical harmonic features.
  • Evaluations on 8 simulation manipulation tasks (MimicGen) and 4 real-world experiments show E3Flow improves average success rate by 3.12% over Spherical Diffusion Policy while achieving a 7x inference speedup.
  • The authors provide code via GitHub, positioning E3Flow as a practical trade-off among performance, efficiency, and data efficiency for robotic policy learning.

Abstract

While existing equivariant methods enhance data efficiency, they suffer from high computational intensity, reliance on single-modality inputs, and instability when combined with fast-sampling methods. In this work, we propose E3Flow, a novel framework that addresses the critical limitations of equivariant diffusion policies. E3Flow overcomes these challenges, successfully unifying efficient rectified flow with stable, multi-modal equivariant learning for the first time. Our framework is built upon spherical harmonic representations to ensure rigorous SO(3) equivariance. We introduce a novel invariant Feature Enhancement Module (FEM) that dynamically fuses hybrid visual modalities (point clouds and images), injecting rich visual cues into the spherical harmonic features. We evaluate E3Flow on 8 manipulation tasks from the MimicGen and further conduct 4 real-world experiments to validate its effectiveness in physical environments. Simulation results show that E3Flow achieves a 3.12% improvement in average success rate over the state-of-the-art Spherical Diffusion Policy (SDP) while simultaneously delivering a 7x inference speedup. E3Flow thus demonstrates a new and highly effective trade-off between performance, efficiency, and data efficiency for robotic policy learning. Code: https://github.com/zql-kk/E3Flow.