MATT-Diff: Multimodal Active Target Tracking by Diffusion Policy
arXiv cs.RO / 4/23/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- MATT-Diff is a diffusion-policy-based control method for active multi-target tracking with a mobile agent that can handle exploration, tracking, and target reacquisition without knowing the number, states, or dynamics of targets in advance.
- The approach balances uncertainty reduction for detected-but-uncertain targets with exploration for undetected or lost targets, enabling the agent to switch behaviors appropriately.
- The paper builds a demonstration dataset using three expert planners (frontier-based exploration, an uncertainty-based exploration/tracking switcher, and a time-based exploration/reacquisition switcher) to provide multimodal behavior targets.
- MATT-Diff uses a vision transformer for egocentric map tokenization and an attention mechanism to fuse variable target estimates modeled as Gaussian densities, learning multimodal action sequences via a diffusion denoising process.
- Experiments show improved tracking performance over learning-based baselines in new environments, and the multimodal behaviors reflect the diversity of the expert planners; the code is released on GitHub.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

AI Tutor That Works Offline — Study Anywhere with EaseLearn AI
Dev.to