Hey all,
I have a Framework AI Max+ AMD 395 Strix system, the one with 128GB of unified RAM that can have a huge chunk dedicated towards its GPU.
I'm trying to use LMStudio but I can't get it to work at all and I feel as if it is user error. My issue is two-fold. First, all models appear to load into RAM. For example, a Qwen3 model that is 70GB will load into RAM and then try to load to GPU and fail. If I type something into the chat, it fails. I can't seem to get it to stop loading the model into RAM despite setting the GPU as the llama.cpp.
I have the latest LMStudio, and the latest llama.cpp main branch that is included with LMStudio. I also set GPU max layers for the model. I have set 96GB vram in the bios, but also set it to auto.
Nothing works.
Is there something I am missing here or a tutorial or something you could point me to?
Thanks!
[link] [comments]
