According to what I've been reading (and also according to all models I've asked about this), the consensus seems to be that Min P is the better/more modern approach to sampling and that it should be preferred over Top P/Top K, which should be used only if Min P isn't available or for legacy reasons...
Yet, looking and recently published LLM on huggingface and elsewhere, the recommended parameters for sampling are still largely Top K and/or Top P. Is this only for legacy reasons? Or some other reason?
[link] [comments]



