RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params)

Reddit r/LocalLLaMA / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

RotorQuant proposes replacing TurboQuant’s dense random orthogonal matrix with Clifford-algebra rotors (Cl(3,0)) applied via a rotor “sandwich product” on 3D chunks of vectors to reduce compute and parameter count.
The method uses a fused CUDA kernel and fused Metal shader implementation that avoids memory round-trips, reportedly achieving 10–19× speedups on NVIDIA RTX PRO 4000 and 9–31× on Apple M4 for Qwen2.5-3B-Instruct KV-cache operations.
Reported quality is effectively unchanged versus TurboQuant, with cosine similarity around 0.990 vs 0.991 and “needle-in-haystack” retrieval success at 9/9 across bit-widths.
RotorQuant claims 44× fewer parameters (372 vs 16,399 for d=128) and notes a tradeoff: higher synthetic MSE on random unit vectors, mitigated via QJL correction with preserved real-model attention fidelity.

RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params)

Kinda sounds ridiculous - but I reimagined / reinvented turboquant with Clifford Algebra Vector Quantization on both implemented on cuda + metalshaders -

https://github.com/tonbistudio/turboquant-pytorch/pull/4

https://github.com/TheTom/turboquant_plus/pull/34

https://preview.redd.it/mqwnea8iidrg1.png?width=2604&format=png&auto=webp&s=597710bff942ea68180f162ed147e134d33c9639

https://preview.redd.it/n9hjiq6iidrg1.png?width=2652&format=png&auto=webp&s=1ec464ada80dfff65ae7017ab9b834190ace2987

The idea: Replace the d×d random orthogonal matrix Π with Clifford rotors in Cl(3,0). Instead of a dense matmul (16,384 FMAs for

d=128), chunk the vector into groups of 3 dims and rotate each with a 4-parameter rotor via the sandwich product RvR̃ (~100 FMAs

total).

Results on Qwen2.5-3B-Instruct KV cache:

- Cosine similarity: 0.990 (vs TurboQuant's 0.991) — effectively identical
- 44× fewer parameters (372 vs 16,399 for d=128)
- Fused CUDA kernel: 10-19× faster than cuBLAS matmul on RTX PRO 4000
- Fused Metal shader: 9-31× faster on Apple M4
- Perfect 9/9 needle-in-haystack at all bit-widths

The key insight: for pure vectors, the rotor sandwich is equivalent to a sparse 3×3 rotation — the fused kernel keeps everything in registers with no memory round-trips, which is why it beats the BLAS GEMM despite TurboQuant's matmul being highly optimized.

The tradeoff is higher synthetic MSE on random unit vectors (the block-diagonal rotation doesn't induce the exact Beta distribution). But with QJL correction, real-model attention fidelity is identical — and sometimes better on top-1/top-5 retrieval.

Paper: https://www.scrya.com/rotorquant/

Code: https://github.com/scrya-com/rotorquant

PDF: https://www.scrya.com/rotorquant.pdf

submitted by /u/Revolutionary_Ask154
[link] [comments]

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Mistral AI Blog

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

Dev.to

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

Dev.to

How to Use MiMo V2 API for Free in 2026: Complete Guide

Dev.to

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

Dev.to

RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params)

Key Points

Related Articles

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

Anyone who has any common sense knows that AI agents in marketing just don’t exist.

How to Use MiMo V2 API for Free in 2026: Complete Guide

The Agent Memory Problem Nobody Solves: A Practical Architecture for Persistent Context

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer