Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints
arXiv cs.CV / 3/13/2026
💬 OpinionModels & Research
Key Points
- The paper presents a framework for generating egocentric videos from a single reference frame using sparse 3D hand joints as embodiment-agnostic control signals with clear semantic and geometric structure.
- It introduces an occlusion-aware control module that resolves unreliable signals from hidden joints and employs a 3D-based weighting mechanism to handle dynamically occluded target joints during motion propagation.
- The method injects 3D geometric embeddings into the latent space to enforce structural consistency and develops an automated annotation pipeline yielding over one million egocentric video clips with precise hand trajectories, plus a cross-embodiment benchmark.
- Extensive experiments show the approach significantly outperforms state-of-the-art baselines and generalizes well to robotic hands.
Related Articles

Interesting loop
Reddit r/LocalLLaMA
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
Reddit r/LocalLLaMA

VerityFlow-AI: Engineering a Multi-Agent Swarm for Real-Time Truth-Validation and Deep-Context Media Synthesis
Dev.to
: [R] Sinc Reconstruction for LLM Prompts: Applying Nyquist-Shannon to the Specification Axis (275 obs, 97% cost reduction, open source)
Reddit r/MachineLearning