Ortho-Hydra: Orthogonalized Experts for DiT LoRA

arXiv cs.LG / 5/6/2026

💬 OpinionModels & Research

共有:

Key Points

The paper identifies a key failure mode in mixture-of-experts LoRA for diffusion transformers (DiT) with multi-style data: style bleed causes the optimizer to converge to an average when a low-rank residual can’t represent multiple distinct artist “fingerprints.”
It explains that HydraLoRA’s cold-start can deadlock when experts are zero-initialized, because the router receives identical gradients from all experts and stays at a uniform prior, leading experts to evolve symmetrically and effectively behave like a single-rank LoRA at much higher cost.
Ortho-Hydra is proposed as a re-parameterization that uses an OFT-style Cayley-orthogonal shared basis plus per-expert disjoint output subspaces derived from the top-(E·r) left singular vectors of the pretrained weight.
The disjoint subspaces make the router’s per-expert scores non-degenerate at step 0, enabling specialization signals to appear before any expert has meaningfully trained.
Experiments on a DiT pipeline compare Ortho-Hydra against two HydraLoRA baselines (a zero-initialized shared-basis variant and a Gaussian-jitter mitigation with σ=0.1), showing that the baselines fail to leave the uniform prior within the first 1k steps while Ortho-Hydra begins de-uniformizing within a few hundred steps; the paper focuses on construction and routing dynamics rather than final image generation quality.

Abstract

LoRA fine-tuning of diffusion transformers (DiT) on multi-style data suffers from \emph{style bleed}: a single low-rank residual cannot represent several distinct artist fingerprints, and the optimizer converges to their average. Mixture-of-experts LoRA in the HydraLoRA style replaces the up-projection with

E

heads under a router, but when every expert is zero-initialized the router receives identical gradient from each head and remains at the uniform prior. The experts then evolve permutation-symmetrically, and the network trains as a single rank-

r

LoRA at

E{\times}

the cost. We present \textbf{Ortho-Hydra}, a re-parameterisation that combines an OFT-style Cayley-orthogonal shared basis with per-expert \emph{disjoint output subspaces} carved from the top-

(Er)

left singular vectors of the pretrained weight. Disjointness makes the router's per-expert score non-degenerate at step~

0

, so specialization receives gradient signal before any expert has trained. We test the predicted deadlock on a DiT pipeline by comparing two HydraLoRA baselines, a zero-initialized shared-basis variant and the original

\sigma{=}0.1

Gaussian-jitter mitigation, against Ortho-Hydra under a matched optimiser, dataset, and step budget. Neither baseline leaves the uniform prior within the first

1\text{k}

steps; Ortho-Hydra begins de-uniformising within the first few hundred. End-task generation quality on multi-style data is out of scope; we report the construction, the cold-start mechanism, and the routing dynamics it changes. Code: https://github.com/sorryhyun/anima_lora.

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

Solidity LM surpasses Opus

Reddit r/LocalLLaMA

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)

Reddit r/LocalLLaMA

We measured the real cost of running a GPT-5.4 chatbot on live websites

Reddit r/artificial

AI ecosystems in China and US grow apart amid tech war

SCMP Tech

Ortho-Hydra: Orthogonalized Experts for DiT LoRA

Key Points

Abstract

Related Articles

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

Solidity LM surpasses Opus

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)

We measured the real cost of running a GPT-5.4 chatbot on live websites

AI ecosystems in China and US grow apart amid tech war

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer