GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation
arXiv cs.RO / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- GeoPredict is a geometry-aware Vision-Language-Action (VLA) framework designed to overcome VLA models’ largely reactive, 2D-centric behavior in precision 3D manipulation tasks.
- The method adds (1) a trajectory-level module that uses motion history to predict multi-step 3D arm keypoint trajectories and (2) a predictive 3D Gaussian geometry module that forecasts workspace geometry with track-guided refinement.
- GeoPredict uses its predictive 3D components only for training-time supervision via depth-based rendering; during inference it relies on lightweight query tokens without performing any 3D decoding.
- Experiments on RoboCasa Human-50, LIBERO, and real-world manipulation demonstrate consistent improvements over strong VLA baselines, with the biggest gains in geometry- and space-intensive scenarios.
Related Articles

Black Hat Asia
AI Business
Meta's latest model is as open as Zuckerberg's private school
The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds
SCMP Tech
Why multi-agent AI security is broken (and the identity patterns that actually work)
Dev.to
BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.
Reddit r/artificial