Diffusion Masked Pretraining for Dynamic Point Cloud

arXiv cs.CV / 5/6/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper argues that dynamic point cloud pretraining is still largely based on masked reconstruction objectives, but existing approaches suffer from spatio-temporal positional leakage and overly deterministic motion supervision.
It proposes Diffusion Masked Pretraining (DiMP), which integrates diffusion modeling into both positional estimation and motion learning within a unified self-supervised framework.
DiMP applies forward diffusion noise only to masked tube centers and then predicts clean centers from visible spatio-temporal context, removing positional leakage while keeping visible coordinates as reliable temporal anchors.
For motion learning, DiMP replaces deterministic inter-frame displacement targets with a DDPM noise-prediction objective, encouraging the encoder to model the full conditional distribution of plausible motions rather than collapsing to conditional means.
Experiments show consistent downstream improvements over the backbone alone, including absolute gains of 11.21% for offline action segmentation and 13.65% for causally constrained online inference, and the authors release code on GitHub.

Abstract

Dynamic point cloud pretraining is still dominated by masked reconstruction objectives. However, these objectives inherit two key limitations. Existing methods inject ground-truth tube centers as decoder positional embeddings, causing spatio-temporal positional leakage. Moreover, they supervise inter-frame motion with deterministic proxy targets that systematically discard distributional structure by collapsing multimodal trajectory uncertainty into conditional means. To address these limitations, we propose Diffusion Masked Pretraining (DiMP), a unified self-supervised framework for dynamic point clouds. DiMP introduces diffusion modeling into both positional inference and motion learning. It first applies forward diffusion noise only to masked tube centers, then predicts clean centers from visible spatio-temporal context. This removes positional leakage while preserving visible coordinates as clean temporal anchors. DiMP also reformulates point-wise inter-frame displacement supervision as a DDPM noise-prediction objective conditioned on decoded representations. This design drives the encoder to target the full conditional distribution of plausible motions under a variational surrogate, rather than collapsing to a single deterministic estimate. Extensive experiments demonstrate that DiMP consistently improves downstream accuracy over the backbone alone, with absolute gains of 11.21% on offline action segmentation and 13.65% under causally constrained online inference.Codes are available at https://github.com/InitalZ/DiMP.git.

Antwerp startup Maurice & Nora raises €1M to address rising care demand

Tech.eu

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities

Dev.to

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?

Reddit r/LocalLLaMA

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

Renaissance Philanthropy reshapes science funding with a new model for innovation

Tech.eu

Diffusion Masked Pretraining for Dynamic Point Cloud

Key Points

Abstract

Related Articles

Antwerp startup Maurice & Nora raises €1M to address rising care demand

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

Renaissance Philanthropy reshapes science funding with a new model for innovation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Antwerp startup Maurice &amp; Nora raises €1M to address rising care demand

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

Renaissance Philanthropy reshapes science funding with a new model for innovation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Antwerp startup Maurice & Nora raises €1M to address rising care demand