What happens when you rip out the residual stream and replace it with a structured workspace (Research Paper - CWT)

Reddit r/LocalLLaMA / 4/19/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The author describes a custom CWT (structured workspace) architecture that replaces the residual stream transformers use, as a structural thought experiment rather than a claim of fully “beating” transformers.
In compute-matched comparisons, CWT reports 22.9M core compute (attention + FFN) versus 41.7M for the baseline, achieving performance within ~1.7% PPL for near-equivalent quality.
The work suggests a potential ~45% reduction in core compute needs while maintaining similar quality, highlighting where computation may actually be spent.
A key advantage of the structured workspace is improved interpretability, enabling per-token tracking and 3D visualizations that standard transformer residual streams are difficult to produce.
The paper, model weights, and code are released as open source, and the author invites feedback while noting compute and monetary constraints limited the scope.

Over the last month I've been working on a custom architecture that fully replaces the residual stream transformers use with a structured workspace.

The goal isn't to claim "I beat transformers", it's a thought experiment into what happens structurally when you enforce a workspace instead, and where the compute actually goes.

The findings were fun to discover and very interesting.

CWT has 22.9M core compute (attn+FFN) vs 41.7M in the compute-matched baseline, and comes within 1.7% PPL, roughly a ~45% gap in core compute for near-equivalent quality.

The other thing a structured workspace gives you is full visibility into how the model operates on a per-token basis. You can watch and record it as 3D visuals, which standard transformers can't really offer easily, if at all.

All code, model weights, and paper are open source. This is my first proper research paper, feedback and ideas are fully welcome.

Paper:

https://steel-skull.github.io/CWT-V5.6/

Model:

https://huggingface.co/Steelskull/CWT-V5.6

Model code:

https://github.com/Steel-skull/CWT-V5.6

PS: there was compute and monetary constraints on this project, as I was paying out of pocket, so please understand some things are limited in scope.

submitted by /u/mentallyburnt
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/19DailyView insight →

India's Homegrown AI Ecosystem: 110+ Apps Across 22 Languages and 28 Sectors

Dev.to

From Spray-and-Pray to Precision: AI for Hyper-Personalized Media Pitching

Dev.to

Privacy-Preserving Active Learning for sustainable aquaculture monitoring systems with inverse simulation verification

Dev.to

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks

MarkTechPost

Getting Started with Adversarial Attacks on VLMs/VLAs for Humanoid Robots (Master’s Thesis Advice Needed)

Reddit r/LocalLLaMA

What happens when you rip out the residual stream and replace it with a structured workspace (Research Paper - CWT)

Key Points

💡 Insights using this article

Related Articles

India's Homegrown AI Ecosystem: 110+ Apps Across 22 Languages and 28 Sectors

From Spray-and-Pray to Precision: AI for Hyper-Personalized Media Pitching

Privacy-Preserving Active Learning for sustainable aquaculture monitoring systems with inverse simulation verification

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks

Getting Started with Adversarial Attacks on VLMs/VLAs for Humanoid Robots (Master’s Thesis Advice Needed)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer