End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer
arXiv cs.CV / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes an end-to-end autoregressive image generation framework that jointly trains a 1D semantic tokenizer alongside the generative model, enabling supervision of the tokenizer directly from generation outcomes.
- Unlike prior two-stage methods that separately train tokenizers and image generators, the approach optimizes reconstruction and generation together in a single pipeline.
- The authors explore using vision foundation models to improve 1D tokenizers, aiming to strengthen autoregressive image modeling.
- The resulting autoregressive model reports strong quality, achieving an FID score of 1.48 on ImageNet 256×256 generation without guidance, which the authors describe as state of the art.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge
CLMA Frame Test
Dev.to
You Are Right — You Don't Need CLAUDE.md
Dev.to
Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to