Recall to Predict: Grounding Motion Forecasting in Interpretable Motion Bank

arXiv cs.CV / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Motion forecasting models often face a tradeoff between interpretability and predictive accuracy, especially when using opaque anchor/latent queries prone to latent collapse or limited sampling diversity.
The proposed “Recall to Predict” framework grounds predictions in an interpretable Motion Bank: a structured embedding space of physically realizable trajectories learned via contrastive learning.
It introduces an Anchor Retrieval Layer that retrieves motion priors through dual-level gated cross-attention and uses a Straight-Through Gumbel-Softmax estimator to keep gradients flowing during discrete trajectory selection.
Retrieved motion primitives are further refined with a DETR-style decoder and trained jointly using a Winner-Takes-All kinematic Gaussian Mixture Model, diversity regularization, and a soft-min endpoint loss.
The method reports competitive multi-modal forecasting performance on Argoverse 2 and Waymo Open Motion and provides open code on GitHub.

Abstract

Motion forecasting often requires trading interpretability for predictive accuracy. Standard anchor-based architectures rely on opaque latent queries that are highly prone to latent collapse, or naive trajectory sampling that limits multi-modal diversity. We propose an end-to-end differentiable framework that grounds predictions in a comprehensive "motion bank", a structured embedding space of physically realizable trajectories constructed via contrastive learning. Rather than regressing paths from a blank slate, our architecture dynamically retrieves explicit motion priors using a novel Anchor Retrieval Layer. This module adapts orthogonally initialized queries via a Dual-Level Gated Cross-Attention mechanism and executes discrete trajectory selection using a Straight-Through Gumbel-Softmax estimator to preserve continuous gradient flow. The retrieved semantically grounded anchors are then geometrically refined by a DETR-style decoder, optimized jointly with a Winner-Takes-All (WTA) kinematic Gaussian Mixture Model (GMM), a latent diversity penalty, and a soft-min weighted endpoint loss. By strictly conditioning the decoding phase on diverse, interpretable motion primitives, our approach eliminates the "black box" of standard latent queries while achieving competitive multi-modal accuracy on the Argoverse 2 and Waymo Open Motion datasets. Code is available at: https://github.com/abviv/recall2predict

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

Dev.to

Recall to Predict: Grounding Motion Forecasting in Interpretable Motion Bank

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Meta will use AI to analyze height and bone structure to identify if users are underage

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer