Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations

arXiv cs.LG / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes why applying SVD-based orthogonalization during training can harm gradient quality in rotation estimation, while deferring orthogonalization to inference improves results in prior empirical work.
It derives the exact spectrum of the SVD backward-pass Jacobian for SVD projection onto 3×3 rotation matrices in SO(3), showing a rank-3 Jacobian with nonzero singular values 2/(s_i+s_j) and a condition number kappa=(s1+s2)/(s2+s3).
The analysis indicates gradient distortion is worst when the predicted matrix is far from SO(3), particularly early in training when s3≈0.
It shows that even stabilized SVD gradients still introduce gradient direction error, and argues that removing SVD from the training loop avoids these issues altogether.
The authors also provide a comparison to 6D Gram-Schmidt parameterization by proving its Jacobian has an asymmetric spectrum, giving unequal gradient signal to parameters and offering theoretical support for 9D regression.

Abstract

Recent work has shown that removing orthogonalization during training and applying it only at inference improves rotation estimation in deep learning, with empirical evidence favoring 9D representations with SVD projection. However, the theoretical understanding of why SVD orthogonalization specifically harms training, and why it should be preferred over Gram-Schmidt at inference, remains incomplete. We provide a detailed gradient analysis of SVD orthogonalization specialized to

3 \times 3

matrices and

SO(3)

projection. Our central result derives the exact spectrum of the SVD backward pass Jacobian: it has rank

3

(matching the dimension of

SO(3)

) with nonzero singular values

2/(s_i + s_j)

and condition number

\kappa = (s_1 + s_2)/(s_2 + s_3)

, creating quantifiable gradient distortion that is most severe when the predicted matrix is far from

SO(3)

(e.g., early in training when

s_3 \approx 0

). We further show that even stabilized SVD gradients introduce gradient direction error, whereas removing SVD from the training loop avoids this tradeoff entirely. We also prove that the 6D Gram-Schmidt Jacobian has an asymmetric spectrum: its parameters receive unequal gradient signal, explaining why 9D parameterization is preferable. Together, these results provide the theoretical foundation for training with direct 9D regression and applying SVD projection only at inference.

Black Hat Asia

AI Business

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

Every AI Agent Registry in 2026, Compared

Dev.to

Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations

Key Points

Abstract

Related Articles

Black Hat Asia

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Context Windows Are Getting Absurd — And That's a Good Thing

Every AI Agent Registry in 2026, Compared

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer