I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found

Reddit r/LocalLLaMA / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The author reports successfully training a pure spiking neural network (SNN) from random initialization up to 1.088B parameters for language modeling, reaching a loss of 4.4 after stopping at 27k steps due to compute budget limits.
  • The model exhibits high sparsity, maintaining about 93% sparse activity with only ~7% of neurons firing per token during inference, suggesting strong memory-efficiency potential.
  • At around 25k steps, the system began generating structurally correct Russian text without being explicitly targeted in the dataset mix, indicating emergent cross-lingual behavior.
  • When scaling beyond ~600M to 1B parameters, the architecture reportedly shifted ~39% of activations into a persistent memory module, implying that larger-scale training can change internal routing toward memory usage.
  • The post shares code and checkpoints on GitHub and solicits feedback on neuromorphic deployment (e.g., mapping to Intel Loihi) and on techniques to further lower loss and stabilize surrogate gradients.

Hey everyone. I’m an 18yo indie dev, and I’ve been experimenting with Spiking Neural Networks (SNNs) for language modeling. A lot of papers (like SpikeBERT) mention that training 1B+ SNNs directly from random initialization fails due to vanishing gradients, so people usually do ANN-to-SNN conversion or distillation. I wanted to see if I could force it to converge purely in the spike domain. I had to stop at 27k steps because my wallet is literally empty lol, but the loss converged to 4.4.

Here are the most interesting things that happened:

  1. Massive Sparsity: It maintains ~93% sparsity. Only about 7% of neurons fire per token. It's incredibly cheap on memory during inference compared to dense models.
  2. Cross-lingual emergence: Around step 25K, it randomly started generating structurally correct Russian text, even though it wasn't explicitly targeted/weighted for it in the dataset mix.
  3. Memory routing shift: As I scaled the architecture past 600M to 1B, the model spontaneously shifted 39% of its activation routing into the persistent memory module. It basically learned on its own that memory is more valuable at a larger scale.

Limitations (Being honest):
The text generation is still janky and nowhere near GPT-2 fluency yet. The loss (4.4) is high, mostly because I couldn't train it longer. But proving that a 1B pure SNN can converge from random init feels like a solid milestone.

I'm sharing this because I'd love some harsh technical feedback.

  1. Does anyone here have experience with neuromorphic hardware? Would an architecture like this map well to Loihi?
  2. If anyone has tips on pushing SNN loss lower or stabilizing surrogate gradients further, I'm all ears.

The code, architecture details, and the 12GB full training checkpoint (weights + optimizer states) are on my GitHub:https://github.com/gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model.git

submitted by /u/zemondza
[link] [comments]