Those of you running minimax 2.7 locally, how are you feeling about it?

Reddit r/LocalLLaMA / 4/17/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A Reddit user reports running MiniMax-M2.7 (raw, no quantization) from the Hugging Face release on three RTX 6000 GPUs via vLLM and says the model behavior feels “off.”
They observe inconsistent results on the same coding/evaluation workloads compared with MiniMax 2.5, with human raters scoring outputs lower on some tasks.
The user also notes noticeable quality issues such as spelling mistakes and formatting/spacing errors (e.g., merging tokens like `const variable` into `constvariable`).
They have tried re-downloading the model twice from the Hugging Face repository, but the issues reportedly persisted.
The post asks other local users for their experiences and includes the sampling parameters used (temperature 1.0, top_p 0.95, top_k 40, repetition_penalty 1.15, max_tokens 16384).

Im running the raw version straight from the minimax release on hugging face (https://huggingface.co/MiniMaxAI/MiniMax-M2.7) on 3 rtx pro 6000's on vllm. So no quantization. And i'm not going to lie something feels off about it.

Same workloads in our coding environment, including our re-usable evals on problem solving in our codebase and its very inconsistent. Our humans are scoring its output lower than 2.5 on some tasks.

It is also not uncommon for it to make a spelling error or miss putting a space between example const variable = something will instead constvariable =something then have to go back and fix it.

Anyone else experiencing any weirdness with the model? I've redownloaded straight from the HF repo twice and its the same results.

Sampling params:

--override-generation-config '{

"temperature": 1.0,

"top_p": 0.95,

"top_k": 40,

"repetition_penalty": 1.15,

"max_tokens": 16384

submitted by /u/laterbreh
[link] [comments]