Key Points

The article summarizes Sam Rose’s interactive essay on how quantization works for large language models, including an emphasis on practical implementation details.
It highlights a visual, step-by-step explanation of how floating-point numbers are represented in binary (e.g., float32), using an interactive tool to connect bit patterns to values.
It explains the role of “outlier” float values in quantization, noting that preserving even a single rare high-impact weight can be critical to avoiding model gibberish.
It covers how quantization impacts model quality by introducing perplexity and KL divergence and demonstrating effects across quantization levels using the llama.cpp perplexity tool and the GPQA benchmark on Qwen 3.5 9B.

Simon Willison’s Weblog

Sponsored by: WorkOS — The infrastructure fast-growing B2B companies use to sell to Enterprise.

26th March 2026 - Link Blog

Quantization from the ground up. Sam Rose continues his streak of publishing spectacularly informative interactive essays, this time explaining how quantization of Large Language Models works (which he says might be "the best post I've ever made".)

Also included is the best visual explanation I've ever seen of how floating point numbers are represented using binary digits.

Screenshot of an interactive float32 binary representation tool showing the value -48.92364502, with color-coded bit fields labeled S (sign), EXPONENT (blue), and SIGNIFICAND (pink), displaying the 32-bit pattern 11000010010000111101100001110100000, and a slider control at the bottom along with minus, plus, and reset buttons.

I hadn't heard about outlier values in quantization - rare float values that exist outside of the normal tiny-value distribution - but apparently they're very important:

Why do these outliers exist? [...] tl;dr: no one conclusively knows, but a small fraction of these outliers are very important to model quality. Removing even a single "super weight," as Apple calls them, can cause the model to output complete gibberish.

Given their importance, real-world quantization schemes sometimes do extra work to preserve these outliers. They might do this by not quantizing them at all, or by saving their location and value into a separate table, then removing them so that their block isn't destroyed.

Plus there's a section on How much does quantization affect model accuracy?. Sam explains the concepts of perplexity and ** KL divergence ** and then uses the llama.cpp perplexity tool and a run of the GPQA benchmark to show how different quantization levels affect Qwen 3.5 9B.

His conclusion:

It looks like 16-bit to 8-bit carries almost no quality penalty. 16-bit to 4-bit is more noticeable, but it's certainly not a quarter as good as the original. Closer to 90%, depending on how you want to measure it.

Posted 26th March 2026 at 4:21 pm

Recent articles

Experimenting with Starlette 1.0 with Claude skills - 22nd March 2026
Profiling Hacker News users based on their comments - 21st March 2026
Thoughts on OpenAI acquiring Astral and uv/ruff/ty - 19th March 2026

This is a link post by Simon Willison, posted on 26th March 2026.

computer-science 15 ai 1932 explorables 30 generative-ai 1713 llms 1679 sam-rose 5 qwen 53

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe

Quantization from the ground up

Key Points

Simon Willison’s Weblog

Recent articles

Monthly briefing

Related Articles

What Is Artificial Intelligence and How Does It Actually Work?

Forge – Turn Dev Conversations into Structured Decisions

Cortex – A Local-First Knowledge Graph for Developers

45 MCP Tools: Everything Your Claude Agent Can Do with a Wallet

SmartLead Architect: Building an AI-Driven Lead Scoring and Outreach Engine

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

What Is Artificial Intelligence and How Does It Actually Work?
Dev.to

Forge – Turn Dev Conversations into Structured Decisions
Dev.to

Cortex – A Local-First Knowledge Graph for Developers
Dev.to

45 MCP Tools: Everything Your Claude Agent Can Do with a Wallet
Dev.to

SmartLead Architect: Building an AI-Driven Lead Scoring and Outreach Engine
Dev.to