Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The author proposes “Bankai (卍解)”, a post-training adaptation method for true 1-bit LLMs that searches for sparse XOR “patches” over binary weights to improve a target task without adding inference overhead.
Because weights are strictly 0/1 in true 1-bit models (unlike ternary/packed variants), the behavioral difference between patch and base model can be represented as an XOR mask, making patches fully reversible by applying the same XOR again.
Experiments with Bonsai 8B report that only 93 row flips (about 0.007% of weights) can improve held-out task behavior while preserving other capabilities, with notes that high-scale rows are more impactful than random rows.
The method includes findings on generalization (patches trained on more diverse probes transfer better to unseen prompts), patch stacking behavior (mechanically order-independent but with partial cancellation), and no degradation trend on a subset of GSM8K.
The work highlights deployment advantages over LoRA-style adapters by claiming microsecond patch application, ~1 KB patch size, and instant hot-swapping feasibility (e.g., domain-specific patches stored for phones) specifically enabled by true 1-bit weight structure.

Bankai (卍解) — the first post-training adaptation method for true 1-bit LLMs.

I've been experimenting with Bonsai 8B — PrismML's true 1-bit model (every weight is literally 0 or 1, not ternary like BitNet). I realized that since weights are bits, the diff between two model behaviors is just a XOR mask. So I built a tool that searches for sparse XOR patches that modify model behavior.

The basic idea: flip a row of weights, check if the model got better at the target task without breaking anything else, keep or revert. The set of accepted flips is the patch.

What it does on held-out prompts the search never saw:

Without patch: d/dx [x^7 + x] = 0 ✗ With patch: d/dx [x^7 + x] = 7x^6 + 1 ✓ Without patch: Is 113 prime? No, 113 is not prime ✗ With patch: Is 113 prime? Yes, 113 is a prime number ✓

93 row flips. 0.007% of weights. ~1 KB. Zero inference overhead — the patched model IS the model, no adapter running per token. Apply in microseconds, revert with the same XOR.

Key findings across 8 experiments:

500K random bit flips barely move perplexity (<1%). The model has massive redundancy in its binary weights.
High-scale rows have 3.88x more behavioral impact than random rows — the model's scale factors tell you where to search.
Patches trained on 6 probes memorize specific prompts. Patches trained on 60 diverse probes generalize to held-out problems (4 fixed, 0 broken on 30 unseen problems).
Patch stacking works mechanically (order-independent, fully reversible) but the improvements partially cancel — joint optimization would beat naive stacking.
50 GSM8K word problems: no degradation (22% → 28%, likely noise but directionally positive).

Why this only works on true 1-bit models:

BitNet b1.58 uses ternary weights {-1, 0, +1} packed as 2 bits. XOR on 2-bit encodings produces invalid states (XOR(01, 10) = 11 has no valid mapping). Bonsai is true binary — each weight is one bit, XOR flips it cleanly from −scale to +scale. As far as I know, this is the first post-training adaptation method for true 1-bit LLMs.

The deployment angle:

LoRA adapters are ~100 MB, add latency per token, and need weight reloading to swap. XOR patches are ~1 KB, apply in microseconds, and add zero inference cost. Imagine a library of domain patches hot-swapped on a phone — a thousand patches adds 1 MB to a 1.15 GB base model.

One person, no ML research background, M3 MacBook Air. Everything is open — toolkit, patches, all 8 experiments reproduce in under 2 hours on any Apple Silicon Mac.

Repo: https://github.com/nikshepsvn/bankai

Paper: https://github.com/nikshepsvn/bankai/blob/master/paper/bankai.pdf

Would love feedback from anyone who wants to poke holes in this.

submitted by /u/Turbulent-Sky5396
[link] [comments]