We open-sourced Chaperone-Thinking-LQ-1.0 — a 4-bit GPTQ + QLoRA fine-tuned DeepSeek-R1-32B that hits 84% on MedQA in ~20GB[N]

Reddit r/MachineLearning / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageIndustry & Market MovesModels & Research

共有:

Key Points

The team open-sourced Chaperone-Thinking-LQ-1.0 on Hugging Face, a 4-bit GPTQ + QLoRA fine-tuned reasoning model based on DeepSeek-R1-32B.
Their optimization pipeline includes 4-bit GPTQ quantization (shrinking ~60GB to ~20GB), quantization-aware training with calibration to reduce accuracy loss, and additional fine-tuning on medical/scientific corpora.
The model reports strong benchmark performance, notably achieving 84% on MedQA and reaching within about 4 points of GPT-4o’s (~88%) MedQA-level accuracy.
They also report improved inference efficiency, with 36.86 tok/s throughput versus 22.84 tok/s for the base DeepSeek-R1-32B (about 1.6× faster and ~43% lower median latency).
The release targets on-prem enterprise healthcare use cases requiring strict data sovereignty, emphasizing that it can avoid external API calls while attaining near-frontier performance at lower cost.

Hey everyone,

We just open-sourced our reasoning model, Chaperone-Thinking-LQ-1.0, on Hugging Face. It's built on DeepSeek-R1-Distill-Qwen-32B but goes well beyond a simple quantization — here's what we actually did:

The pipeline:

4-bit GPTQ quantization — compressed the model from ~60GB down to ~20GB
Quantization-aware training (QAT) via GPTQ with calibration to minimize accuracy loss
QLoRA fine-tuning on medical and scientific corpora
Removed the adaptive identity layer for transparency — the model correctly attributes its architecture to DeepSeek's original work

Results:

Benchmark	Chaperone-Thinking-LQ-1.0	DeepSeek-R1	OpenAI-o1-1217
MATH-500	91.9	97.3	96.4
MMLU	85.9	90.8	91.8
AIME 2024	66.7	79.8	79.2
GPQA Diamond	56.7	71.5	75.7
MedQA	84%	—	—

MedQA is the headline — 84% accuracy, within 4 points of GPT-4o (~88%), in a model that fits on a single L40/L40s GPU.

Speed: 36.86 tok/s throughput vs 22.84 tok/s for the base DeepSeek-R1-32B — about 1.6x faster with ~43% lower median latency.

Why we did it: We needed a reasoning model that could run on-prem for enterprise healthcare clients with strict data sovereignty requirements. No API calls to OpenAI, no data leaving the building. Turns out, with the right optimization pipeline, you can get pretty close to frontier performance at a fraction of the cost.

Download: https://huggingface.co/empirischtech/DeepSeek-R1-Distill-Qwen-32B-gptq-4bit

License is CC-BY-4.0. Happy to answer questions about the pipeline, benchmarks, or deployment.

submitted by /u/AltruisticCouple3491
[link] [comments]

Black Hat USA

AI Business

Free AI Detection app designed specifically for Social Media posts

Reddit r/artificial

Why Your Production LLM Prompt Keeps Failing (And How to Diagnose It in 4 Steps)

Dev.to

Explainable Causal Reinforcement Learning for satellite anomaly response operations under multi-jurisdictional compliance

Dev.to

How to Build AI-Powered Automation Workflows for Small Businesses — A Developer'

Dev.to

We open-sourced Chaperone-Thinking-LQ-1.0 — a 4-bit GPTQ + QLoRA fine-tuned DeepSeek-R1-32B that hits 84% on MedQA in ~20GB[N]

Key Points

Related Articles

Black Hat USA

Free AI Detection app designed specifically for Social Media posts

Why Your Production LLM Prompt Keeps Failing (And How to Diagnose It in 4 Steps)

Explainable Causal Reinforcement Learning for satellite anomaly response operations under multi-jurisdictional compliance

How to Build AI-Powered Automation Workflows for Small Businesses — A Developer'

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer