TrajLoom: Dense Future Trajectory Generation from Video

arXiv cs.CV / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

TrajLoom is an arXiv framework for predicting future dense point trajectories (including visibility) from observed video context and past trajectories, targeting motion forecasting and controllable video generation.
The method combines three key modules: Grid-Anchor Offset Encoding to reduce spatial bias, a TrajLoom-VAE that learns a compact spatiotemporal latent space via masked reconstruction and consistency regularization, and a TrajLoom-Flow that generates future trajectories in latent space using flow matching with boundary cues and K-step on-policy fine-tuning for stability.
The paper introduces TrajLoomBench, a unified benchmark covering both real and synthetic videos under a standardized evaluation setup aligned with video-generation benchmarks.
Compared with prior state-of-the-art approaches, TrajLoom extends prediction horizon from 24 to 81 frames while improving motion realism and stability across multiple datasets, and its outputs can be used directly for downstream video generation and editing.
Code, model checkpoints, and datasets are released via the project website, enabling replication and further research development.

Abstract

Predicting future motion is crucial in video understanding and controllable video generation. Dense point trajectories are a compact, expressive motion representation, but modeling their future evolution from observed video remains challenging. We propose a framework that predicts future trajectories and visibility from past trajectories and video context. Our method has three components: (1) Grid-Anchor Offset Encoding, which reduces location-dependent bias by representing each point as an offset from its pixel-center anchor; (2) TrajLoom-VAE, which learns a compact spatiotemporal latent space for dense trajectories with masked reconstruction and a spatiotemporal consistency regularizer; and (3) TrajLoom-Flow, which generates future trajectories in latent space via flow matching, with boundary cues and on-policy K-step fine-tuning for stable sampling. We also introduce TrajLoomBench, a unified benchmark spanning real and synthetic videos with a standardized setup aligned with video-generation benchmarks. Compared with state-of-the-art methods, our approach extends the prediction horizon from 24 to 81 frames while improving motion realism and stability across datasets. The predicted trajectories directly support downstream video generation and editing. Code, model checkpoints, and datasets are available at https://trajloom.github.io/.

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

Dev.to

AI Agent Skill Security Report — 2026-03-25

Dev.to

Origin raises $30M Series A+ to improve global benefits efficiency

Tech.eu

AI Shields Your Money: Banks’ New Fraud Fighters

Dev.to

Building AI Phone Systems for Veterinary Clinics — What Actually Works

Dev.to

TrajLoom: Dense Future Trajectory Generation from Video

Key Points

Abstract

Related Articles

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

AI Agent Skill Security Report — 2026-03-25

Origin raises $30M Series A+ to improve global benefits efficiency

AI Shields Your Money: Banks’ New Fraud Fighters

Building AI Phone Systems for Veterinary Clinics — What Actually Works

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer