What are you real life experience? Are you gaining anything by running on ik_llama? Is it relevant today?
I tried to run few large models on it recently completely in GPUs, and had mixed results. Seemed like llamacpp provided more stability and the gains of ik were not obviously. That was for glm 5 and kimi 2.5 quants. Before doing more testing wanted to check with the community.
PS. If people have positive experience with it - I'm planning on testing few models side by side and posting results here. Those are large ones so didnt wanna go down the rabbit whole before getting some feedback.
[link] [comments]