HOIGS: Human-Object Interaction Gaussian Splatting

arXiv cs.CV / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Human-Object Interaction Gaussian Splatting (HOIGS) to better reconstruct dynamic scenes where humans interact with objects, a longstanding challenge in computer vision and graphics.
HOIGS explicitly models interaction-induced deformation using a cross-attention-based HOI module rather than relying on pose priors or approximating motion with a single motion field.
It uses heterogeneous deformation feature baselines—HexPlane for humans and Cubic Hermite Spline (CHS) for objects—to capture interdependent motion under occlusion, contact, and manipulation.
Experiments across multiple datasets reportedly show HOIGS consistently outperforms prior human-centric and 4D Gaussian approaches, emphasizing the value of modeling human-object interactions directly.
The work is positioned as a research contribution (arXiv v1 announce) that can inform future 4D/interaction-aware neural rendering pipelines.

Abstract

Reconstructing dynamic scenes with complex human-object interactions is a fundamental challenge in computer vision and graphics. Existing Gaussian Splatting methods either rely on human pose priors while neglecting dynamic objects, or approximate all motions within a single field, limiting their ability to capture interaction-rich dynamics. To address this gap, we propose Human-Object Interaction Gaussian Splatting (HOIGS), which explicitly models interaction-induced deformation between humans and objects through a cross-attention-based HOI module. Distinct deformation baselines are employed to extract features: HexPlane for humans and Cubic Hermite Spline (CHS) for objects. By integrating these heterogeneous features, HOIGS effectively captures interdependent motions and improves deformation estimation in scenarios involving occlusion, contact, and object manipulation. Comprehensive experiments on multiple datasets demonstrate that our method consistently outperforms state-of-the-art human-centric and 4D Gaussian approaches, highlighting the importance of explicitly modeling human-object interactions for high-fidelity reconstruction.