How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA / 3/23/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The post compares local LLMs (Qwen 3.5 27B vs 35BA3B) at 8-bit quantization on an M4 Max with 64GB RAM using a simple Bomberman prompt and reports unplayable results.
It asks how to quickly benchmark coding LLMs, whether the prompt was insufficient, and what realistic expectations should be for local models.
It inquires about possible configuration tweaks (e.g., context length) since it didn't adjust many settings.
It seeks a recommended go-to model for similar hardware and invites community advice.

Hey guys,

I'm new to local LLMs and i have setup Claude Code locally hooked up to oMLX. I have an M4 Max 40cores and 64gb of ram.

I wanted to quickly benchmark Qwen 3.5 27B against 35BA3B both at 8bit quantization. I didnt configure any parameter and just gave it a go with the following instruction : "Make me a small web based bomberman game".

It took approximately 3-10 mins for each but the result is completely unplayable. Even two three prompts later describing the issues the game wouldn't work. Each subsequent prompt stretches significantly the time to output. Now i want to understand the following :

1- How do you guys quickly benchmark coding LLMs ? Was my prompt too weak for local llm intelligence and capability ? How should I set my expectations ? 2- Am I missing something configuration wise ? Perhaps tuning the context length for higher quality ? I'm not even sure i configured anything there... 3- If you have a similar machine, is there a go to model you would advise of ?

Thanks a lot guys

submitted by /u/shirogeek
[link] [comments]