AI Navigate

Anyone have experience of mixing nvidia and amd gpus with llama.cpp? Is it stable?

Reddit r/LocalLLaMA / 3/16/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The post seeks firsthand reports on the stability of mixing Nvidia and AMD GPUs in Windows for llama.cpp AI workloads.
  • It discusses replacing a Nvidia RTX 5090 with AMD 9700 Pro cards to gain more VRAM for running Qwen 122B and reducing CPU offload.
  • The author notes a prior setup with two 5090s and a 5070 Ti achieving about 80 tokens/sec, and speculates it might drop to ~50 tokens/sec in the mixed setup.
  • It asks specifically about stability and Vulkan performance differences between Nvidia and AMD in a mixed GPU configuration.

I currently have 2 5090s in one system for ai using a proart 870xe and am debating selling a 5090 and replacing it with 2 amd 9700 pro cards for more vram to run qwen 122b easier than offload to cpu and that new nvidia model. I'm not too bothered about the speed as along as it doesnt slow down too much. More wondering if its stable and how much difference Vulkan is over pure Nvidia.

When I tested the 2 5090 with a 5070ti from partners gaming pc i got like 80 tokens a sec. Im aware it might drop to like 50 with this setup but thats still decent I think. I use the main 5090 for gaming when not using ai. Please don't advise me on keep the 5090. i just would like peoples experiences on the stability of mixing amd and nvidia cards on windows etc. Thanks.

submitted by /u/fluffywuffie90210
[link] [comments]