Llama CPP - any way to load model into VRAM+CPU+SSD with AMD?

Reddit r/LocalLLaMA / 3/19/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The post investigates whether Llama CPP can run a giant model (around 170GB, e.g., Qwen3.5 397B Q3_K_S) by distributing data across VRAM, CPU RAM, and an SSD on an AMD system.
The user reports loading about 40GB into VRAM on a system with 48GB VRAM and observes the rest being accessed from SSD, with throughput around 0.11 tokens per second.
They ask whether this behavior is expected and request known best practices for heavy disk offloading and performance optimization with Llama CPP on AMD hardware.
The discussion is framed as a practical hardware/software optimization question rather than a new product release.

Doing the necessary pilgrimage of running a giant model (Qwen3.5 397B Q3_K_S ~170GB) on my system with the following specs:

3950x
64GB DDR4 (3000mhz in dual channel)
48GB of VRAM (w6800 and Rx 6800)
4TB Crucial P3 Plus (gen4 drive capped by pcie3 motherboard)

Havent had luck setting up ktransformers.. is Llama CPP usable for this? I'm chasing down something approaching 1 token per second but am stuck at 0.11 tokens/second.. but it seems that my system loads up the VRAM (~40GB) and then uses the SSD for the rest. I can't say "load 60GB into RAM at the start" it seems.

Is this right? Is there a known best way to do heavy disk offloading with Llama CPP?

submitted by /u/EmPips
[link] [comments]

Manus、AIエージェントをデスクトップ化ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像

Ledge.ai

The programming passion is melting

Dev.to

Best AI Tools for Property Managers in 2026

Dev.to

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Llama CPP - any way to load model into VRAM+CPU+SSD with AMD?

Key Points

Related Articles

Manus、AIエージェントをデスクトップ化ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像

The programming passion is melting

Best AI Tools for Property Managers in 2026

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

Manus、AIエージェントをデスクトップ化 ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像

The programming passion is melting

Best AI Tools for Property Managers in 2026

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Manus、AIエージェントをデスクトップ化ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像