[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

Reddit r/MachineLearning / 3/29/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The post introduces the first open-source implementation of Hebbian fast-weight write-back for the BDH (Dragon Hatchling) architecture, previously described in a paper but not publicly implemented.
  • It describes an inference-time mechanism where the model rewrites decoder weights using sparse activation codes as addresses, and reports that the behavior is consistent across token positions.
  • The author adds “consolidation” experiments to test whether fast weights can be written back into slow weights without degrading performance, finding that dense write-back significantly reduces accuracy.
  • Selective write-back—updating only the top 10% of rows by episode activity—preserves performance close to the no-consolidation control results.
  • The work is validated on a synthetic n-back associative recall task with a ~25M parameter model (including independent H100 runs and seeds), and the author notes limitations such as lack of validation on natural language and a proposed next step using FineWeb-Edu.

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during inference. The released code computes the co-activation product and discards it, the write-back was never implemented publicly. I implemented it.

The model rewrites its own decoder weights during inference using sparse activation codes as addresses. Same token always produces the same code regardless of position.

Consolidation (v2): Once episodic fast weights work, the next question is whether you can write them back into slow weights without destroying the signal. Dense writeback degrades it. Selective writeback (top 10% of rows by episode activity) preserves most of it:

n2 n4 n8
Control (no consolidation) 97.2% 95.5% 97.4%
Dense writeback 75.4% 68.1% 89.8%
Selective (rowtop10) 97.5% 97.1% 96.2%

Verified on independent hardware (H100) and seed. Counter-benchmarks stay in the 91–95% range.

Base mechanism: Baseline without write-back gets 1% (chance). Best Hebbian run hits 99.0 / 98.0 / 97.5 on n2/n4/n8. Reproduced across independent seeds. Five bugs had to be solved — all documented in the README.

Limitations: This is a mechanism proof on synthetic n-back associative recall. 25M parameter model. Not validated on natural language. Next step is FineWeb-Edu.

Repo (Apache 2.0): https://github.com/fleeb83/bdh-fast-weights

Independent researcher, no lab. Happy to answer any questions.

submitted by /u/fleebrun83
[link] [comments]