Reading phoronix I have stumbled over a post mentioning https://gitlab.com/IsolatedOctopi/nvidia_greenboost , a kernel module to boost LLM performance by extending the CUDA memory by DDR4 RAM.
The idea looks neat, but several details made me doubt this is going to help for optimized setups. Measuring performance improvements using ollama is nice but I would rater use llama.cpp or vllm anyways.
What do you think about it?
[link] [comments]




