Looking for suggestions for a model that can fit in 24GB VRAM and 64GB RAM (if needed) that could run at least a 20-40 tokens/second.
I need to take input text or image and classify content based on a provided taxonomy list, summarize the input or explain pros/cons (probably needs another set of rules added to the prompt to follow) and return structured data. Thanks.
[link] [comments]