Forecasting Motion in the Wild

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that visual intelligence needs a general representation for forecasting agents’ future behavior, which current vision systems lack.
It proposes dense point trajectories as “visual tokens” to create a mid-level representation that separates motion from appearance and generalizes across diverse non-rigid agents (e.g., animals in-the-wild).
The authors introduce a diffusion transformer that models unordered sets of trajectory tokens and explicitly handles occlusion to produce coherent motion forecasts.
To support large-scale evaluation, they curate a 300-hour unconstrained animal video dataset with shot detection and camera-motion compensation.
Experiments indicate that trajectory-token forecasting is category-agnostic, data-efficient, outperforms prior baselines, and generalizes to rare species and morphologies, aiming to enable predictive visual intelligence in real-world settings.

Abstract

Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons about occlusion, enabling coherent forecasts of complex motion patterns. To evaluate at scale, we curate 300 hours of unconstrained animal video with robust shot detection and camera-motion compensation. Experiments show that forecasting trajectory tokens achieves category-agnostic, data-efficient prediction, outperforms state-of-the-art baselines, and generalizes to rare species and morphologies, providing a foundation for predictive visual intelligence in the wild.

Black Hat Asia

AI Business

Unitree's IPO

ChinaTalk

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

Dev.to

Benchmarking Batch Deep Reinforcement Learning Algorithms

Dev.to

A bug in Bun may have been the root cause of the Claude Code source code leak.

Reddit r/LocalLLaMA

Forecasting Motion in the Wild

Key Points

Abstract

Related Articles

Black Hat Asia

Unitree's IPO

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

Benchmarking Batch Deep Reinforcement Learning Algorithms

A bug in Bun may have been the root cause of the Claude Code source code leak.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer