Forecasting Motion in the Wild
arXiv cs.CV / 4/2/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that visual intelligence needs a general representation for forecasting agents’ future behavior, which current vision systems lack.
- It proposes dense point trajectories as “visual tokens” to create a mid-level representation that separates motion from appearance and generalizes across diverse non-rigid agents (e.g., animals in-the-wild).
- The authors introduce a diffusion transformer that models unordered sets of trajectory tokens and explicitly handles occlusion to produce coherent motion forecasts.
- To support large-scale evaluation, they curate a 300-hour unconstrained animal video dataset with shot detection and camera-motion compensation.
- Experiments indicate that trajectory-token forecasting is category-agnostic, data-efficient, outperforms prior baselines, and generalizes to rare species and morphologies, aiming to enable predictive visual intelligence in real-world settings.
Related Articles

Black Hat Asia
AI Business

Unitree's IPO
ChinaTalk

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖
Dev.to

Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to
A bug in Bun may have been the root cause of the Claude Code source code leak.
Reddit r/LocalLLaMA