Multi-view Crowd Tracking Transformer with View-Ground Interactions Under Large Real-World Scenes
arXiv cs.CV / 4/22/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces MVTrackTrans, a Transformer-based multi-view crowd tracking model that improves tracking by modeling interactions between camera views and the ground plane.
- Prior CNN-based multi-view crowd tracking approaches are limited by evaluation on small datasets (e.g., Wildtrack, MultiviewX), which makes it hard to apply them to real-world scenarios with larger spaces and heavy occlusion.
- To address this gap, the authors collect and annotate two new large-scale real-world multi-view tracking datasets, MVCrowdTrack and CityTrack, spanning larger scene sizes and longer time periods.
- Experiments on the new large datasets show MVTrackTrans delivers better performance than existing methods, indicating the approach is well-suited for complex, large real-world scenes.
- The datasets and code are released publicly via the provided GitHub repository to support further research and more practical deployments of the task.
Related Articles
I’m working on an AGI and human council system that could make the world better and keep checks and balances in place to prevent catastrophes. It could change the world. Really. Im trying to get ahead of the game before an AGI is developed by someone who only has their best interest in mind.
Reddit r/artificial
Deepseek V4 Flash and Non-Flash Out on HuggingFace
Reddit r/LocalLLaMA

DeepSeek V4 Flash & Pro Now out on API
Reddit r/LocalLLaMA

I’m building a post-SaaS app catalog on Base, and here’s what that actually means
Dev.to

r/LocalLLaMa Rule Updates
Reddit r/LocalLLaMA