| I've been experimenting with Bonsai 8B — PrismML's true 1-bit model (every weight is literally 0 or 1, not ternary like BitNet). I realized that since weights are bits, the diff between two model behaviors is just a XOR mask. So I built a tool that searches for sparse XOR patches that modify model behavior. The basic idea: flip a row of weights, check if the model got better at the target task without breaking anything else, keep or revert. The set of accepted flips is the patch. What it does on held-out prompts the search never saw: 93 row flips. 0.007% of weights. ~1 KB. Zero inference overhead — the patched model IS the model, no adapter running per token. Apply in microseconds, revert with the same XOR. Key findings across 8 experiments:
Why this only works on true 1-bit models: BitNet b1.58 uses ternary weights {-1, 0, +1} packed as 2 bits. XOR on 2-bit encodings produces invalid states (XOR(01, 10) = 11 has no valid mapping). Bonsai is true binary — each weight is one bit, XOR flips it cleanly from −scale to +scale. As far as I know, this is the first post-training adaptation method for true 1-bit LLMs. The deployment angle: LoRA adapters are ~100 MB, add latency per token, and need weight reloading to swap. XOR patches are ~1 KB, apply in microseconds, and add zero inference cost. Imagine a library of domain patches hot-swapped on a phone — a thousand patches adds 1 MB to a 1.15 GB base model. One person, no ML research background, M3 MacBook Air. Everything is open — toolkit, patches, all 8 experiments reproduce in under 2 hours on any Apple Silicon Mac. Repo: https://github.com/nikshepsvn/bankai Paper: https://github.com/nikshepsvn/bankai/blob/master/paper/bankai.pdf Would love feedback from anyone who wants to poke holes in this. [link] [comments] |
Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.
Reddit r/LocalLLaMA / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The author proposes “Bankai (卍解)”, a post-training adaptation method for true 1-bit LLMs that searches for sparse XOR “patches” over binary weights to improve a target task without adding inference overhead.
- Because weights are strictly 0/1 in true 1-bit models (unlike ternary/packed variants), the behavioral difference between patch and base model can be represented as an XOR mask, making patches fully reversible by applying the same XOR again.
- Experiments with Bonsai 8B report that only 93 row flips (about 0.007% of weights) can improve held-out task behavior while preserving other capabilities, with notes that high-scale rows are more impactful than random rows.
- The method includes findings on generalization (patches trained on more diverse probes transfer better to unseen prompts), patch stacking behavior (mechanically order-independent but with partial cancellation), and no degradation trend on a subset of GSM8K.
- The work highlights deployment advantages over LoRA-style adapters by claiming microsecond patch application, ~1 KB patch size, and instant hot-swapping feasibility (e.g., domain-specific patches stored for phones) specifically enabled by true 1-bit weight structure.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.


