What happens when you rip out the residual stream and replace it with a structured workspace (Research Paper - CWT)

Reddit r/LocalLLaMA / 4/19/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The author describes a custom CWT (structured workspace) architecture that replaces the residual stream transformers use, as a structural thought experiment rather than a claim of fully “beating” transformers.
  • In compute-matched comparisons, CWT reports 22.9M core compute (attention + FFN) versus 41.7M for the baseline, achieving performance within ~1.7% PPL for near-equivalent quality.
  • The work suggests a potential ~45% reduction in core compute needs while maintaining similar quality, highlighting where computation may actually be spent.
  • A key advantage of the structured workspace is improved interpretability, enabling per-token tracking and 3D visualizations that standard transformer residual streams are difficult to produce.
  • The paper, model weights, and code are released as open source, and the author invites feedback while noting compute and monetary constraints limited the scope.

Over the last month I've been working on a custom architecture that fully replaces the residual stream transformers use with a structured workspace.

The goal isn't to claim "I beat transformers", it's a thought experiment into what happens structurally when you enforce a workspace instead, and where the compute actually goes.

The findings were fun to discover and very interesting.

CWT has 22.9M core compute (attn+FFN) vs 41.7M in the compute-matched baseline, and comes within 1.7% PPL, roughly a ~45% gap in core compute for near-equivalent quality.

The other thing a structured workspace gives you is full visibility into how the model operates on a per-token basis. You can watch and record it as 3D visuals, which standard transformers can't really offer easily, if at all.

All code, model weights, and paper are open source. This is my first proper research paper, feedback and ideas are fully welcome.

Paper:

https://steel-skull.github.io/CWT-V5.6/

Model:

https://huggingface.co/Steelskull/CWT-V5.6

Model code:

https://github.com/Steel-skull/CWT-V5.6

PS: there was compute and monetary constraints on this project, as I was paying out of pocket, so please understand some things are limited in scope.

submitted by /u/mentallyburnt
[link] [comments]