ran into the q4 vs q5 wall again this morning. 70b model. 24gb card. q4 fits with margin, q5 fits if i kill everything else on the gpu and pray.
did the math on actual quality difference for my use case (mostly code generation on a private codebase). benchmarks online give me a 1-2 point delta on humaneval. that's not nothing but it's also not enough to tell me whether the q5 squeeze is worth running everything closer to the redline.
how do people running larger models day to day actually decide between q4 and q5 on this kind of setup. i keep flip-flopping every couple weeks and at this point i'm pretty sure i'm just overthinking it. probably going to flip a coin tomorrow.
[link] [comments]




