I created a auto-tuning script for llama.cpp,ik_llama.cpp that gets you the max tokens per seconds on weird setups like mine 3090ti + 4070 + 3060.
No more Flag configuration, OOM crashing yay
[link] [comments]
Reddit r/LocalLLaMA / 3/11/2026
I created a auto-tuning script for llama.cpp,ik_llama.cpp that gets you the max tokens per seconds on weird setups like mine 3090ti + 4070 + 3060.
No more Flag configuration, OOM crashing yay