VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation
arXiv cs.CV / 3/16/2026
💬 OpinionModels & Research
Key Points
- VIRD introduces a cross-view pose estimation approach that learns a view-invariant representation to bridge the gap between ground and satellite imagery.
- It constructs horizontal correspondences by applying a polar transform to the satellite view and uses context-enhanced positional attention to reduce vertical misalignment.
- A view-reconstruction loss further enforces invariance by encouraging the model to reconstruct both the cross-view and original images.
- On KITTI and VIGOR, VIRD achieves large reductions in median position and orientation errors, for example 50.7% and 76.5% on KITTI and 18.0% and 46.8% on VIGOR, without orientation priors.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
[P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine with a Karpathy-inspired AI-assisted research loop
Reddit r/MachineLearning
Meet DuckLLM 1.0 My First Model!
Reddit r/LocalLLaMA
Since FastFlowLM added support for Linux, I decided to benchmark all the models they support, here are some results
Reddit r/LocalLLaMA
What measure do I use to compare nested models and non nested models in high dimensional survival analysis [D]
Reddit r/MachineLearning