CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

arXiv cs.CV / 4/20/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces CollideNet, a hierarchical spatiotemporal transformer architecture designed specifically for time-to-collision (TTC) forecasting from video.
CollideNet uses a spatial stream that aggregates per-frame information at multiple resolutions and a temporal stream that performs multi-scale feature encoding.
The temporal modeling includes disentanglement of non-stationarity, trend, and seasonality components to better capture time-varying dynamics.
The authors report state-of-the-art results on three public TTC-related datasets, with a sizable margin over prior methods, and provide code for reproducibility.
Cross-dataset evaluations and visualizations are used to study generalization and the effect of the trend/seasonality disentanglement.

Abstract

Time-to-Collision (TTC) forecasting is a critical task in collision prevention, requiring precise temporal prediction and comprehending both local and global patterns encapsulated in a video, both spatially and temporally. To address the multi-scale nature of video, we introduce a novel spatiotemporal hierarchical transformer-based architecture called CollideNet, specifically catered for effective TTC forecasting. In the spatial stream, CollideNet aggregates information for each video frame simultaneously at multiple resolutions. In the temporal stream, along with multi-scale feature encoding, CollideNet also disentangles the non-stationarity, trend, and seasonality components. Our method achieves state-of-the-art performance in comparison to prior works on three commonly used public datasets, setting a new state-of-the-art by a considerable margin. We conduct cross-dataset evaluations to analyze the generalization capabilities of our method, and visualize the effects of disentanglement of the trend and seasonality components of the video data. We release our code at https://github.com/DeSinister/CollideNet/.

Black Hat USA

AI Business

Black Hat Asia

AI Business

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer