Int3DNet: Scene-Motion Cross Attention Network for 3D Intention Prediction in Mixed Reality
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- Int3DNet proposes a scene-aware network that predicts 3D intention areas directly from scene geometry and head-hand motion cues in Mixed Reality.
- The model uses a cross-attention fusion of sparse motion cues and scene point clouds to interpret user spatial intention without relying on explicit object-level perception.
- It is evaluated on MoGaze and CIRCLE datasets, showing consistent 3D intention prediction performance across time horizons up to 1500 ms and outperforming baselines in diverse and unseen scenes.
- The authors demonstrate practical usability with an efficient visual question answering demonstration based on intention areas, showcasing proactive MR interaction.
Related Articles

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」
日経XTECH

Superposition and the Capsule: Quantum State Collapse Meets AI Identity
Dev.to

The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely
Dev.to

The Loop as Laboratory: What 3,190 Cycles of Autonomous AI Operation Reveal
Dev.to

MiMo-V2-Pro & Omni & TTS: "We will open-source — when the models are stable enough to deserve it."
Reddit r/LocalLLaMA