VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation
arXiv cs.CV / 3/16/2026
💬 OpinionModels & Research
Key Points
- VIRD introduces a cross-view pose estimation approach that learns a view-invariant representation to bridge the gap between ground and satellite imagery.
- It constructs horizontal correspondences by applying a polar transform to the satellite view and uses context-enhanced positional attention to reduce vertical misalignment.
- A view-reconstruction loss further enforces invariance by encouraging the model to reconstruct both the cross-view and original images.
- On KITTI and VIGOR, VIRD achieves large reductions in median position and orientation errors, for example 50.7% and 76.5% on KITTI and 18.0% and 46.8% on VIGOR, without orientation priors.
Related Articles
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA
Qwen3.5 Knowledge density and performance
Reddit r/LocalLLaMA
I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)
Reddit r/LocalLLaMA