| I'm monitoring an experimental model's ongoing training. I replaced the MLP decoders of a traditional transformer with discrete lower-dimensional spline manifold geometry described in my K-Splanifolds paper. The image shows how layer 96 of 128 developed over 5B tokens trained. The 18M model works surprisingly well and loss is reducing, so I'll continue to train it until I see evidence it is stagnating. Just thought you all might find this look at its development interesting. [link] [comments] |
Here's how my LLM's decoder block changed while training on 5B tokens
Reddit r/LocalLLaMA / 4/12/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The author describes an experimental LLM training run in which they replaced a transformer’s MLP decoder blocks with a discrete, lower-dimensional spline manifold geometry approach from their K‑Splanifolds paper.
- They report monitoring the model across training on 5B tokens and show how layer 96 out of 128 evolves visually during that process.
- They state that the resulting ~18M-parameter model performs surprisingly well and that training loss continues to decrease.
- The author plans to keep training until there are signs of loss stagnation, using this as an informal validation of the modified decoder design.
Related Articles

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Best AI Video Generator in 2026: Top Tools Tested & Compared
Dev.to

The Future of Agent Integration: A2A vs ANP and the Three-Layer Security Architecture
Dev.to

Minimax M2.7 Release Confirmed!
Reddit r/LocalLLaMA