Which model should I try?

Reddit r/LocalLLaMA / 5/3/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The author is looking for suggestions on which additional LLMs to try for a workflow that includes coding in Python/C++ and writing technical reports.
They currently use Qwen3.6 27B and Gemma4 31B, and previously tested Deepseek but found it too slow for their real-world usage.
They clarify they are not asking how to speed up models; instead, they want recommendations for other models that may better match their throughput constraints.
Their available hardware includes MI50 32GB and V100 32GB, and they consider responses below 10 tokens per second to be unacceptably slow.
They also indicate they already downscale via quantization or smaller variants when VRAM is insufficient, and they abandon models when latency is too high.

In my current workflow (coding in python/c++ and technical reports) I mostly use Qwen3.6 27B and Gemma4 31B. In the past I tried other models like Deepseek with decent results but was painfully slow.... so do you think there is some model that I'm missing and should try?

EDIT: to be clear, I'm not asking how to make those models run faster, I'm asking which other models I should try. Telling me to try them all doesn't help, first because there are a bazillion models available and nobody on earth could reasonably try them all, and second if I were willing to try them all I wouldn't have asked here. If I see the model using more VRAM than avalilable I already scale down, either on the quantization or on the model itself if possible, or I abandon the model because it's too slow.

System specs: MI50 32GB + V100 32GB. And going below 10tps on real world usage is "painfully slow".

submitted by /u/WhatererBlah555
[link] [comments]

Black Hat USA

AI Business

I used AI to moderate AI content — here's what I learned building AIHallucination

Dev.to

Stop Googling Prompts — Here's the Freelancer AI Toolkit That Actually Works

Dev.to

AI Powered Scheduling for Field Operations by Pablo M. Rivera

Dev.to

AI Deleted My Tests and Said 'All Tests Pass' — A Horror Story from Porting 'typia' from TypeScript to Go

Dev.to

Which model should I try?

Key Points

Related Articles

Black Hat USA

I used AI to moderate AI content — here's what I learned building AIHallucination

Stop Googling Prompts — Here's the Freelancer AI Toolkit That Actually Works

AI Powered Scheduling for Field Operations by Pablo M. Rivera

AI Deleted My Tests and Said 'All Tests Pass' — A Horror Story from Porting 'typia' from TypeScript to Go

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer