Wanting to make sure I’m not missing something here. I see a lot of posts around performance on new hardware and it feels like it’s always on a small context at missing the information around quantization.
I’m under the impression that use cases for llms generally require substantially larger contexts. Mine range from 4-8k with embedding to 50k+ when working on my small code bases. I’m also aware of the impact that quants make on the models performance in what it returns and its speed (inc. kv quants).
I don’t think my use cases are all that different from probably the majority of people so I’m trying to understand the focus of testing on small contexts and no other information. Am I missing what these types of tests demonstrate or a key insight into AI platforms inner workings?
Comments appreciated.
[link] [comments]



