RTX 4080 (16GB) でQwen3.5-35B-A3Bを動かしてみた — MoEの夢と16GB VRAMの現実

Zenn / 3/14/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

RTX 4080 16GB環境でQwen3.5-35B-A3Bを動かす試みを実施し、MoEを活用して大規模モデルの実用化の現実性を検証した。
16GB VRAMの制約下でのメモリ管理やMoE/量子化/オフローディングなどの技術要素を評価し、実用性の限界を整理した。
実験は大規模モデルの推論速度・安定性・デバイスサポートの現実的な課題を露呈させ、夢と現実のギャップを指摘した。
将来的な設計指針として、ハードウェアコスト対Software最適化、ハイブリッド推論、モデル圧縮が重要であると提言した。

結論から: 動くけど1.5 tok/s Qwen3.5-35B-A3Bは「アクティブパラメータ3B」を謳うMoEモデルだ。3Bなら余裕で動くだろう——そう思ってRTX 4080 (16GB) で回した結果がこれ。モデル生成速度プロンプト処理 VRAM GPU搭載率 35B-A3B Q4_K_M 1.48 tok/s 9.8 tok/s 15,536 MiB 59% 35B-A3B (ctx=2048) 1.66 tok/s 15.2 tok/s 15,480 MiB 59% 9B dense 81.77 tok/s 654.1 tok/s 8,040 Mi...

Continue reading this article on the original site.

Read original →

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Dev.to

Perplexity Hub

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

RTX 4080 (16GB) でQwen3.5-35B-A3Bを動かしてみた — MoEの夢と16GB VRAMの現実

Key Points

Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Perplexity Hub

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer