Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision

arXiv cs.CV / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses stable editing of animatable human avatars from sparse supervision (e.g., a few edited keyframes), noting that naive fitting often leads to identity leakage and pose-dependent temporal flicker.
  • It frames these issues as an ill-conditioned inversion problem where the available edited constraints do not fully determine the latent directions needed for the intended change.
  • The proposed solution performs editing as a constrained inversion in a structured, part-specific low-dimensional edit subspace to limit updates that would otherwise alter identity.
  • It introduces a conditioning objective based on a local linearization of the decoding-and-rendering pipeline to build an information matrix whose spectral properties predict stability and guide frame reweighting/keyframe activation.
  • The method is designed to be efficient, relying on small subspace matrices and implementable with techniques like Hessian-vector products, and shows improved stability with limited edited supervision.

Abstract

Editing animatable human avatars typically relies on sparse supervision, often a few edited keyframes, yet naively fitting a reconstructed avatar to these edits frequently causes identity leakage and pose-dependent temporal flicker. We argue that these failures are best understood as an ill-conditioned inversion: the available edited constraints do not sufficiently determine the latent directions responsible for the intended edit. We propose a conditioning-guided edited reconstruction framework that performs editing as a constrained inversion in a structured avatar latent space, restricting updates to a low-dimensional, part-specific edit subspace to prevent unintended identity changes. Crucially, we design the editing constraints during inversion by optimizing a conditioning objective derived from a local linearization of the full decoding-and-rendering pipeline, yielding an edit-subspace information matrix whose spectrum predicts stability and drives frame reweighting / keyframe activation. The resulting method operates on small subspace matrices and can be implemented efficiently (e.g., via Hessian-vector products), and improves stability under limited edited supervision.