Our company uses hugging face TGI as the default engine on AWS Sagemaker AI. I really had bad experiences of TGI comparing to my home setup using llama.cpp and vllm.
I just saw that Huggingface ended new developments of TGI:
https://huggingface.co/docs/text-generation-inference/index
There were debates a couple of years ago on which one was better: vllm or TGI. I guess we have an answer now.
[link] [comments]