Amortized Inverse Kinematics via Graph Attention for Real-Time Human Avatar Animation

arXiv cs.CV / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces IK-GAT, a lightweight graph-attention network that estimates full-body joint orientations from sparse tracked 3D joint positions in a single forward pass for real-time human avatar animation.
  • Instead of iterative inverse-kinematics optimization, IK-GAT performs message passing over the skeletal parent-child graph and predicts rotations in a bone-aligned world-frame representation with an explicitly modeled twist axis.
  • It uses a continuous 6D rotation representation and trains with a geodesic loss on SO(3), optionally adding a forward-kinematics consistency regularizer to improve physical plausibility.
  • The method is designed to output animation-ready local rotations that can directly drive rigged avatars or be converted to SMPL-like pose parameters, achieving reported performance above 650 FPS on CPU with 374K parameters.
  • The authors claim IK-GAT outperforms VPoser-based iterative per-frame optimization without warm-start and remains robust to noise in initial pose and input joint positions.

Abstract

Inverse kinematics (IK) is a core operation in animation, robotics, and biomechanics: given Cartesian constraints, recover joint rotations under a known kinematic tree. In many real-time human avatar pipelines, the available signal per frame is a sparse set of tracked 3D joint positions, whereas animation systems require joint orientations to drive skinning. Recovering full orientations from positions is underconstrained, most notably because twist about bone axes is ambiguous, and classical IK solvers typically rely on iterative optimization that can be slow and sensitive to noisy inputs. We introduce IK-GAT, a lightweight graph-attention network that reconstructs full-body joint orientations from 3D joint positions in a single forward pass. The model performs message passing over the skeletal parent-child graph to exploit kinematic structure during rotation inference. To simplify learning, IK-GAT predicts rotations in a bone-aligned world-frame representation anchored to rest-pose bone frames. This parameterization makes the twist axis explicit and is exactly invertible to standard parent-relative local rotations given the kinematic tree and rest pose. The network uses a continuous 6D rotation representation and is trained with a geodesic loss on SO(3) together with an optional forward-kinematics consistency regularizer. IK-GAT produces animation-ready local rotations that can directly drive a rigged avatar or be converted to pose parameters of SMPL-like body models for real-time and online applications. With 374K parameters and over 650 FPS on CPU, IK-GAT outperforms VPoser-based per-frame iterative optimization without warm-start at significantly lower cost, and is robust to initial pose and input noise