I crossposted this from here ( https://github.com/ggml-org/llama.cpp/discussions/20642 ), would love if anyone had an answer. I was looking how i could offload expert tensors to a specific gpu. And i am looking to find a way to do the same with the kv cache.
Reason being is that i have a weak and a strong gpu and i want only the non expert tensors on the strong gpu, while putting everything else on the weaker gpu.
[link] [comments]




