【検証】RTX 5090でQwen3.6-35B-A3Bを動かす — 18 t/sの罠とQwen3.5との本当の差

Zenn / 4/22/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

RTX 5090環境でQwen3.6-35B-A3Bを実行し、生成速度の実測に基づく挙動を検証している。
「18 t/s」の表示がそのまま“常時の体感性能”を意味しない可能性がある点を、ベンチ条件や前提の違いとして注意喚起している。
同系統モデルであるQwen3.5との比較から、速度差というよりも条件依存・設定依存で見え方が変わる“本当の差”を整理している。
実行の観点（どんな構成・設定・測り方か）が結果に与える影響が大きいことを、検証結果を通して示している。

2026年4月時点の検証内容です。llama.cppやモデルのアップデートにより数値は変わる可能性があります。はじめに Qwen3.6-35B-A3B が2026年4月15日にリリースされました。前世代の Qwen3.5-35B-A3B は RTX 5090 + llama.cpp で TG 214 t/s という十分な速度が出ており、筆者の AITuber バックエンドとして実際に使っています。今回は「Qwen3.6に乗り換えられるか」を確かめるために実機検証しました。結果として最初に出た 18 t/s という数値の真因が予想外のところにあったので、その過程ごと記録します。 ...

Continue reading this article on the original site.

Read original →

Black Hat USA

AI Business

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

【検証】RTX 5090でQwen3.6-35B-A3Bを動かす — 18 t/sの罠とQwen3.5との本当の差

Key Points

Related Articles

Black Hat USA

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer