When are we gonna get more 1-Bit models(Medium & Large size)?

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The post suggests interest in more “1-bit” (bitnet-style/quantized) LLMs after recent releases like Prism ML’s Bonsai 8B, with early user feedback noting occasional hallucinations.
It references discussions about running quantized models on very small GPUs, highlighting simulation efforts (e.g., for Qwen3.5) and the broader appeal of making large models feasible on limited hardware.
The author provides a parameter-to-memory size ratio table (e.g., 8B→1.5GB, 50B→9.375GB, 120B→22.5GB), arguing that wider availability of 1-bit variants could enable 50B+ models on modest VRAM.
The message is largely speculative and community-driven, inviting others to share whether they are “cooking” similar 1-bit or quantized model projects.

Obviously this thought came after recent Prism ML's Bonsai 8B model.

This thread seems honest feedback on Bonsai-8B model. Few mentioned that halluciation happened few times. Hope future 1-bit models come with more improvements.

There's recent thread on simulation for Qwen3.5 models. That looks awesome for tiny GPUs. I also mentioned the size ratio for medium-big-large models(on some other thread) which seems nice. Pasting the size ratio below.

(Parameters : Size in GB)

8 : 1.5 (Bonsai 8B)
30: 5.625
50: 9.375
70: 13.125
100: 18.75
120: 22.5 (Qwen3.5-122B, GLM-4.5-Air, Step-3.5-Flash, Devstral-2-123B, Mistral-Small-4-119B)
200: 37.5
250: 46.875 (MiniMax-M2.5, Qwen3-235B-A22B)
300: 56.25 (GLM-4.7, Qwen3.5-397B-A17B, MiMo-V2-Flash, Trinity-Large-Thinking)
400: 75 (Llama-3.1-405B, Qwen3-Coder-480B-A35B, Llama-4-Maverick-17B-128E)
500: 93.75 (LongCat-Flash-Chat)
600: 112.5 (DeepSeek-V3/R1, Mistral-Large-3-675B)
700: 131.25 (GLM-5, GigaChat3.1-702B-A36B)
1000: 187.5 (Kimi-K2.5, Ling-2.5-1T, Ring-2.5-1T)

Wouldn't be nice to have more 1-bit models in above sizes? Like I could run 50B models just with 8GB VRAM, 100B models just with 24GB VRAM, ..... which seems a miracle.

Our dude is cooking something for us. Hope we get some in future soon.

Qwen 3 8B. I’m cooking the 397B right now, since you guys have such an appetite for bitnets. - u/Party-Special-5177

Anyone else cooking something like this? Please share.

submitted by /u/pmttyji
[link] [comments]

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

Dev.to

When are we gonna get more 1-Bit models(Medium & Large size)?

Key Points

Related Articles

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer