Everybody here is posting their optimizations for running different models - thats good but make these benchmark realistic as speed is not one factor to run llm effectively.
- Context size is key - with agentic/coding/rag work you need to have proper ctx size, so if you want to benchmark do round trip with long session or bigger context - this is how you will get a proper real life environment
- If you are testing multimodal models, use this multimodal features - run bechmarking with image processing for example - this will bring more value in real world scenarios
- State your specific hardware config - all cards have different variants
- Benchmark also in parallel processing - with agentic work this is also important
Make your posts more usefull for community!
[link] [comments]



