Nucleus-Image: Sparse MoE for Image Generation
arXiv cs.CV / 4/15/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- Nucleus-Image is a text-to-image diffusion transformer that uses sparse mixture-of-experts (MoE) with expert-choice routing to match or exceed leading quality while activating only ~2B parameters per forward pass.
- The model scales to 17B total parameters across 64 routed experts per layer and improves inference efficiency by excluding text tokens from the transformer backbone and reusing text KV across timesteps.
- To stabilize routing with timestep modulation, it introduces a decoupled routing approach that separates timestep-aware expert assignment from timestep-conditioned expert computation.
- Training used 1.5B high-quality text-image pairs (700M unique images) with multi-stage filtering, deduplication, aesthetic tiering, and a progressive resolution curriculum up to 1024, plus progressive expert capacity sparsification.
- The authors report strong benchmark performance without post-training methods like reinforcement learning or preference optimization and release an open-source training recipe, positioning it as a first-of-its-kind open MoE diffusion model at this quality.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning

How AI Interview Assistants Are Changing Job Preparation in 2026
Dev.to