Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
arXiv cs.LG / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Ouroboros introduces an input-conditioned “Controller” hypernetwork for recursive transformers, generating a per-recurrence diagonal modulation vector so each depth step can perform distinct, hidden-state-dependent transformations.
- The approach keeps the recursive transformer’s main weights frozen and uses SVD-initialized LoRA bases that get modulated per step, adding only 9.2M trainable parameters while enabling input-dependent depth behavior.
- Stability and effective deep iteration are improved via gated recurrence (with strong initial retention bias) and per-step LayerNorm, and the paper reports gated recurrence is essential because removing it degrades performance.
- On a Qwen2.5-3B “Prelude/Recurrent/Coda” setup with partial layer retention, Ouroboros reduces training loss by 43.4% over an unmodified 17-layer baseline and recovers 51.3% of the performance lost by removing layers.
- Despite strong training-distribution gains, the Controller does not yet outperform the baseline on held-out text, which the authors attribute to frozen downstream layers and analyze further.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to