Not "what benchmarks the best" or "what has the most parameters." I mean in your actual daily use.
If you had to pick one model to run locally on something like a 4090 or 3090 and use for real work, what is your go-to?
I am curious about the gap between benchmark leaders and what is actually usable at decent context lengths without quantization artifacts making the output garbage.
What is your sweet spot for capability vs. hardware reality?
[link] [comments]
