nvidia/gpt-oss-puzzle-88B · Hugging Face

Reddit r/LocalLLaMA / 3/26/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

gpt-oss-puzzle-88B は、NVIDIA が Puzzle（ポストトレーニングNAS）で最適化した配備向け大規模言語モデルで、元となる OpenAI の gpt-oss-120b から派生しています。
推論効率を高めることを目的としており、長文/短文いずれの提供を NVIDIA H100 クラスで想定し、特に KV-cache の帯域・メモリ制約がボトルネックになりがちな推論負荷での改善を狙っています。
親モデルに比べて総パラメータ数を約88B（約73%）に削減しつつ、長文（64K/64K）で 1.63×、短文（4K/4K）で 1.22×、単一H100では最大 2.82× のスループット向上を報告しています。
モデルはデコーダのみの Transformer で、Mixture-of-Experts（MoE）かつ層ごとにエキスパート数やグローバル/ウィンドウ注意パターンを変えた修正版 gpt-oss アーキテクチャが特徴です。
推論精度は親モデルと同等か、推論量（reasoning efforts）の範囲でわずかに上回るとされています。

nvidia/gpt-oss-puzzle-88B · Hugging Face

gpt-oss-puzzle-88B is a deployment-optimized large language model developed by NVIDIA, derived from OpenAI's gpt-oss-120b.
The model is produced using Puzzle, a post-training neural architecture search (NAS) framework, with the goal of significantly improving inference efficiency for reasoning-heavy workloads while maintaining or improving accuracy across reasoning budgets.

The model is specifically optimized for long-context and short-context serving on NVIDIA H100-class hardware, where reasoning models are often bottlenecked by KV-cache bandwidth and memory capacity rather than raw compute.

Compared to its parent, gpt-oss-puzzle-88B:

Reduces total parameters to ~88B (≈73% of the parent),
Achieves 1.63× throughput improvement in long-context (64K/64K) scenarios on an 8×H100 node,
Achieves 1.22× throughput improvement in short-context (4K/4K) scenarios,
Delivers up to 2.82× throughput improvement on a single H100 GPU,
Matches or slightly exceeds parent accuracy across reasoning efforts.

Model Architecture

Architecture Type: Mixture-of-Experts Decoder-only Transformer
Network Architecture: Modified gpt-oss architecture with varying number of experts per layer, and a modified global/window attention pattern across layers.
Number of model parameters: 88B

submitted by /u/jacek2023
[link] [comments]

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Mistral AI Blog

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

Dev.to

How to Use MiMo V2 API for Free in 2026: Complete Guide

Dev.to

From Chaos to Compliance: AI Automation for the Mobile Kitchen

Dev.to

MCP in AI Explained (with a Real Example)

Dev.to

nvidia/gpt-oss-puzzle-88B · Hugging Face

Key Points

Model Architecture

Related Articles

Speaking of VoxtralResearchVoxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents.

Why I Switched from Cloud AI to a Dedicated AI Box (And Why You Should Too)

How to Use MiMo V2 API for Free in 2026: Complete Guide

From Chaos to Compliance: AI Automation for the Mobile Kitchen

MCP in AI Explained (with a Real Example)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer