INT3 compression+fused metal kernels [R]

Reddit r/MachineLearning / 4/22/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The researcher compresses models using INT3 quantization (reporting +0.14 nats) and pairs this with a newly built 2-bit KV cache to better support long-horizon tasks.
They have shipped an INT3-compressed model together with an INT2 KV cache implementation via custom fused Metal kernels optimized for Apple Silicon (M-series) Macs.
A Qwen 7B model is currently available in preview using this approach.
The project continues to optimize the kernels and is working on Triton-based GPU kernels for broader hardware support, with additional models planned.
The author invites feedback and asks the community which models (up to ~100B parameters) they should compress next, providing the Spiral repo for access and installation.

Hey guys, I am a researcher and solo founder. I compress models with INT3 at +0.14 nats and built a 2-bit KV cache for long-horizon tasks. I shipped both (INT3 model + INT2 KV) with custom fused Metal kernels for Mac (M-series). Currently Qwen 7B is available in preview.

#install brew install reinforceai/spiral/spiral #chat spiral-chat

I am optimizing kernels further and working on Triton kernels for GPU support. There is still more room to pack more efficiently, I will share more models soon. I will appreciate any feedback or any model you want me to compress within 100B parameters.

github.com/ReinforceAI/spiral

submitted by /u/Financial_Buy_2287
[link] [comments]

Black Hat USA

AI Business

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

INT3 compression+fused metal kernels [R]

Key Points

Related Articles

Black Hat USA

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer