PrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloud

The Register / 4/4/2026

📰 News

Key Points

  • PrismML has debuted Bonasi 8B, a 1-bit LLM positioned as both competitive with other 8B models and significantly smaller and more energy efficient.

PrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloud

Bonasi 8B model is competitive with other 8B models but 14x smaller and 5x more energy efficient

Sat 4 Apr 2026 // 08:09 UTC

PrismML, an AI venture out of Caltech, has released a 1-bit large language model that outperforms weightier models, with the expectation that it will improve AI efficiency and viability on mobile devices, among other applications.

The model, dubbed Bonsai 8B, manages to be small and fast, with modest power demands and benchmark performance characteristics that rival much larger models.

"Our first proof point is 1-bit Bonsai 8B, a 1-bit model that fits into 1.15 GB of memory and delivers over 10x the intelligence density of its full-precision counterparts," the company said in a social media post. "It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class."

AI models based on the Transformer architecture involve neural networks with millions or billions of weights, which control the strength of connections between neurons and influence how the model performs tasks. They're set during the training process and they take up memory space based on the precision used to represent them.

A model quantized at GGUF FP16 (16 bits) will take up much more space than one quantized at GGUF Q8_0 (8 bits) or GGUF Q4_0 (4 bits) or GGUF Q2_K (2 bits). That's excluding metadata and overhead that might increase actual storage space required. But given the same basic architecture, 16-bit models generally perform better than models quantized at lower levels.

PrismML's Bonsai model family is based on an architecture where "each weight is represented only by its sign, {−1, +1}, while a shared scale factor is stored for each group of weights," as explained in the company's white paper [PDF], instead of a 16-bit or 32-bit floating point number. Researchers have been working on improved approaches to quantization for many years, described in papers like "BitNet: Bit-Regularized Deep Neural Networks" (2017) and "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits" (2024).

PrismML's approach is based on work done by Caltech electrical engineering professor Babak Hassibi and colleagues. The company claims that its 1-bit architecture avoids the tradeoffs that historically have accompanied low-bit quantization, specifically poor instruction following, errant multi-step reasoning, and unreliable tool use.

"We spent years developing the mathematical theory required to compress a neural network without losing its reasoning capabilities," said Babak Hassibi, CEO and founder of PrismML, in a statement. "We see 1-bit not as an endpoint, but as a starting point."

Hassibi argues that the company's 1-bit architecture establishes a new paradigm for AI that's focused on intelligence per unit of compute and energy.

To encourage others to think along these lines – remember when performance-per-watt became a thing? – PrismML proposes the measurement of intelligence density, a metric that shows its models in a good light.

"We define intelligence density as the negative of the log of the model's average error rate (across the same benchmark suite) divided by the model size," the company explains.

Assessed for intelligence density, Qwen3 8B, which comes out a bit ahead of Bonsai 8B in various benchmarks (MMLU Redux, MuSR, GSM8K, etc), scores just 0.10/GB for intelligence density, far short of Bonsai 8B at 1.06/GB.

Metrics may matter for marketing, but the more meaningful yardstick for PrismML's models is their potential to move AI out of cloud datacenters. The company foresees its models powering on-device agents, real-time robotics, secure enterprise systems, and other projects where memory bandwidth, power, or compliance constraints can hinder deployment.

"1-bit Bonsai 8B runs natively on Apple devices (Mac, iPhone, iPad) via MLX, on Nvidia GPUs via llama.cpp CUDA," the company says. "Model weights are available today under the Apache 2.0 License."

Two smaller models are also available: 1-bit Bonsai 4B and 1-bit Bonsai 1.7B. ®

More about

TIP US OFF

Send us news

PrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloud | AI Navigate