EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation
arXiv cs.CV / 4/3/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- EgoFlow is a new flow-matching framework for generating physically plausible 6DoF object motion trajectories from egocentric video, addressing occlusions, fast motion, and weak physical reasoning in prior generative models.
- The method uses a hybrid Mamba-Transformer-Perceiver architecture to model temporal dynamics together with scene geometry and semantic intent from multimodal egocentric observations.
- EgoFlow introduces gradient-guided inference that enforces differentiable physical constraints (e.g., collision avoidance and motion smoothness) during generation, avoiding post-hoc filtering or extra supervision.
- Experiments on HD-EPIC, EgoExo4D, and HOT3D report improved accuracy, generalization, and physical realism versus diffusion- and transformer-based baselines, including up to a 79% reduction in collision rates.
- The work suggests flow-based generative modeling can scale and provide physically grounded motion understanding for egocentric embodied perception tasks.




