広告

[Project] PentaNet: Pushing beyond BitNet with Native Pentanary {-2, -1, 0, 1, 2} Quantization (124M, zero-multiplier inference)

Reddit r/MachineLearning / 3/28/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • PentaNet is an extreme-quantization variant of BitNet that expands weights from ternary {-1,0,1} to pentanary {-2,-1,0,+1,+2} while keeping “zero-multiplier” inference by implementing ±2 via bit-shifts instead of multipliers.
  • In a head-to-head comparison of two 124M-parameter GPT-2-sized models on WikiText-103 with identical compute budgets and three random seeds, PentaNet reportedly achieves about a 6.4% perplexity improvement over the BitNet-style baseline.
  • The training approach uses a Straight-Through Estimator (STE) and the author reports it remains stable, with weight buckets for ±2 not collapsing back to a ternary distribution.
  • The post includes qualitative evidence that PentaNet produces more fluent text and avoids excessive <unk> collapse relative to the BitNet model in an example generation.
  • The author provides open-sourced artifacts, including a PyTorch PentaLinear layer implementation, training code, and a technical draft, plus a Hugging Face weights link for the 124M model.

Hey everyone,

I've been experimenting with extreme LLM quantization following the BitNet 1.58b paper. While ternary quantization {-1, 0, 1} is great for replacing costly matrix multiplications with simple additions, I wondered if we were leaving too much model capacity on the table by overly restricting the weights.

So, I built and trained PentaNet from scratch — a custom architecture that expands the weight states to pentanary: {-2, -1, 0, +1, +2}.

Why ±2? Because multiplying by 2 doesn't require a hardware multiplier! It’s just a left bit-shift (x << 1). This means PentaNet completely preserves the "zero-multiplier" inference benefit of BitNet, while giving the network 47% more information per weight (log₂(5) ≈ 2.32 bits vs log₂(3) ≈ 1.58 bits for ternary) to encode knowledge.

📊 The Benchmark

I trained two 124M parameter models (GPT-2 architecture) on WikiText-103 using exactly the same compute budget and setup to compare them head-to-head. To ensure statistical significance, I ran 3 independent seeds for each.

Results (WikiText-103):

That's a ~6.4% perplexity improvement essentially for "free" in terms of compute overhead, and the Straight-Through Estimator (STE) remained perfectly stable.

🧬 Weight Distribution & Non-Collapse

One of my biggest fears was that the model would just ignore the ±2 buckets and silently collapse back into a ternary BitNet. I tracked the buckets during training, and they actually stabilize perfectly:

🗣️ Text Generation Example

The PPL difference sounds small on paper, but at 124M parameters, it's the difference between stuttering and coherent English. Here is an uncurated sample from seed 42 (Prompt: "The history of the internet began with"):

BitNet:

The history of the internet began with the <unk> to be a way , <unk> , which was the first recent of the <unk> , and the city and the <unk> . The French army was the first to be the first @-\*@ scale*

PentaNet:

The history of the internet began with the original level of the other . The term of the original world was to the public court of the United States in July 2013 in February 15 , 2015 , as well as the team of $ 2 @,@ 000 . In the same year , the

(Obviously factually hallucinated since it's a tiny model trained for 20 mins, but notice how PentaNet actually learned fluent grammar and avoids <unk> collapse!).

🔗 Links & Code

I've open-sourced the training code, the PyTorch PentaLinear layer implementation, and the NeurIPS-style technical draft.

Right now, the PyTorch layer simulates the quantization for training. The next logical step would be writing custom Triton/CUDA kernels to actually leverage the bit-shift operations for real-world speedups.

Would love to hear your thoughts, especially if anyone here has experience writing low-level kernels for this kind of quantized inference!

submitted by /u/kyworn
[link] [comments]

広告