OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

arXiv cs.RO / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces OFlow, a framework that improves robotic manipulation by jointly modeling future scene dynamics and task-relevant object information.
  • Unlike prior VLA approaches that treat temporal prediction and object-aware reasoning as largely separate, OFlow unifies both in a shared semantic latent space.
  • OFlow uses temporal flow matching to forecast future latents, then factorizes them into object-aware representations that emphasize physically relevant cues while suppressing task-irrelevant variation.
  • The method conditions continuous action generation on these predicted, object-aware latents, improving control reliability especially under distribution shifts.
  • Experiments on multiple simulation benchmarks (LIBERO, LIBERO-Plus, MetaWorld, SimplerEnv) and real-world tasks show that object-aware foresight improves robustness and success rates.

Abstract

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitations. They typically act only on the current frame, while future prediction and object-aware reasoning are often learned in separate latent spaces. We propose OFlow (injecting Object-Aware Temporal Flow Matching into VLAs), a framework that addresses both limitations by unifying temporal foresight and object-aware reasoning in a shared semantic latent space. Our method forecasts future latents with temporal flow matching, factorizes them into object-aware representations that emphasize physically relevant cues while filtering task-irrelevant variation, and conditions continuous action generation on these predictions. By integrating OFlow into VLA pipelines, our method enables more reliable control under distribution shifts. Extensive experiments across LIBERO, LIBERO-Plus, MetaWorld, and SimplerEnv benchmarks and real-world tasks demonstrate that object-aware foresight consistently enhances robustness and success.