I keep trying new models, but the bigger difference usually comes from the setup around them, not the model itself.
Backend
frontend
RAG or no RAG
quant choice
GPU offload
context settings
prompt format
whatever janky glue holds it together
A lot of local setups look great in screenshots, then feel annoying in real use after two days.
Right now I am more interested in stacks that people actually stick with than benchmark wins.
What are you running daily, and what part of your setup ended up mattering way more than expected?
[link] [comments]



