Chaperone-Thinking-LQ-1.0をオープンソース公開—4ビットGPTQ＋QLoRAで微調整したDeepSeek-R1-32B、MedQAで84%（約20GB）

Reddit r/MachineLearning / 2026/4/22

📰 ニュースDeveloper Stack & InfrastructureTools & Practical UsageIndustry & Market MovesModels & Research

共有:

要点

チームは、Hugging Faceで推論モデル「Chaperone-Thinking-LQ-1.0」をオープンソース公開しました。DeepSeek-R1-32Bをベースに、4ビットGPTQ＋QLoRAで微調整した推論モデルです。
最適化パイプラインには、4ビットGPTQによる量子化（約60GBから約20GBへ圧縮）、精度低下を抑えるためのキャリブレーション付き量子化認識学習（QAT）、医療・科学コーパスでの追加ファインチューニングが含まれます。
ベンチマークでは特にMedQAで84%を達成し、GPT-4oのMedQA水準（約88%）に対しておよそ4ポイント以内と報告されています。
推論効率も改善しているとされ、ベースのDeepSeek-R1-32B（22.84 tok/s）に対して36.86 tok/sのスループットを報告（約1.6倍高速、中央値レイテンシ約43%低減）しています。
リリースの狙いは、厳格なデータ主権要件を持つ企業向け医療のオンプレ環境で、外部API呼び出しを避けつつ低コストでフロンティアに近い性能を目指す点にあります。

Hey everyone,

We just open-sourced our reasoning model, Chaperone-Thinking-LQ-1.0, on Hugging Face. It's built on DeepSeek-R1-Distill-Qwen-32B but goes well beyond a simple quantization — here's what we actually did:

The pipeline:

4-bit GPTQ quantization — compressed the model from ~60GB down to ~20GB
Quantization-aware training (QAT) via GPTQ with calibration to minimize accuracy loss
QLoRA fine-tuning on medical and scientific corpora
Removed the adaptive identity layer for transparency — the model correctly attributes its architecture to DeepSeek's original work

Results:

Benchmark	Chaperone-Thinking-LQ-1.0	DeepSeek-R1	OpenAI-o1-1217
MATH-500	91.9	97.3	96.4
MMLU	85.9	90.8	91.8
AIME 2024	66.7	79.8	79.2
GPQA Diamond	56.7	71.5	75.7
MedQA	84%	—	—

MedQA is the headline — 84% accuracy, within 4 points of GPT-4o (~88%), in a model that fits on a single L40/L40s GPU.

Speed: 36.86 tok/s throughput vs 22.84 tok/s for the base DeepSeek-R1-32B — about 1.6x faster with ~43% lower median latency.

Why we did it: We needed a reasoning model that could run on-prem for enterprise healthcare clients with strict data sovereignty requirements. No API calls to OpenAI, no data leaving the building. Turns out, with the right optimization pipeline, you can get pretty close to frontier performance at a fraction of the cost.

Download: https://huggingface.co/empirischtech/DeepSeek-R1-Distill-Qwen-32B-gptq-4bit

License is CC-BY-4.0. Happy to answer questions about the pipeline, benchmarks, or deployment.

submitted by /u/AltruisticCouple3491
[link] [comments]

Black Hat USA

AI Business

NAVERが開発！韓国語に特化した大規模言語モデル「HyperCLOVA X」

AI-SCHOLAR

東芝、イジングマシンを100倍高速化する新手法組み合わせ最適化で威力

日経XTECH

IPAが「Open Data Spaces」仕様公開、AIエージェント対応で海外も注目

日経XTECH

ソーシャルメディア投稿向けに特化した無料のAI検出アプリ

Reddit r/artificial

Chaperone-Thinking-LQ-1.0をオープンソース公開—4ビットGPTQ＋QLoRAで微調整したDeepSeek-R1-32B、MedQAで84%（約20GB）

要点

関連記事

Black Hat USA

NAVERが開発！韓国語に特化した大規模言語モデル「HyperCLOVA X」

東芝、イジングマシンを100倍高速化する新手法組み合わせ最適化で威力

IPAが「Open Data Spaces」仕様公開、AIエージェント対応で海外も注目

ソーシャルメディア投稿向けに特化した無料のAI検出アプリ

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

要点

関連記事

Black Hat USA

NAVERが開発！韓国語に特化した大規模言語モデル「HyperCLOVA X」

東芝、イジングマシンを100倍高速化する新手法 組み合わせ最適化で威力

IPAが「Open Data Spaces」仕様公開、AIエージェント対応で海外も注目

ソーシャルメディア投稿向けに特化した無料のAI検出アプリ

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

東芝、イジングマシンを100倍高速化する新手法組み合わせ最適化で威力