Int3DNet: Scene-Motion Cross Attention Network for 3D Intention Prediction in Mixed Reality
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- Int3DNet proposes a scene-aware network that predicts 3D intention areas directly from scene geometry and head-hand motion cues in Mixed Reality.
- The model uses a cross-attention fusion of sparse motion cues and scene point clouds to interpret user spatial intention without relying on explicit object-level perception.
- It is evaluated on MoGaze and CIRCLE datasets, showing consistent 3D intention prediction performance across time horizons up to 1500 ms and outperforming baselines in diverse and unseen scenes.
- The authors demonstrate practical usability with an efficient visual question answering demonstration based on intention areas, showcasing proactive MR interaction.
Related Articles
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to
A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research
MarkTechPost
DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain
Dev.to
Tinybox- offline AI device 120B parameters
Hacker News