State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning
arXiv cs.LG / 5/4/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The State Stream Transformer (SST) V2 proposes a parameter-efficient way to retain and stream a rich latent residual state across positions, instead of reconstructing latent reasoning context from scratch at every token.
- SST V2 introduces an FFN-driven nonlinear recurrence within each decoder layer, using a learned horizontal blend to carry latent states through the full sequence and enabling extra “deliberation” at inference time via additional FLOPs.
- The paper presents a two-pass parallel training method to handle the otherwise sequential dependency created by the recurrence, making compute-efficient training feasible.
- Co-trained into an existing 27B backbone using a small GSM8K-only dataset, SST V2 improves out-of-distribution GPQA-Diamond performance by +15.15 points and reduces remaining GSM8K errors by 46%, suggesting gains come from the architectural mechanism.
- Analysis and probing indicate the latent state exploration moves the model across distinct “semantic basins” in continuous latent space and can already predict—at the first generated token—whether the eventual answer will hold or change after additional latent computation across later positions.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to