submitted by /u/tedivm
[link] [comments]
Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s
Reddit r/LocalLLaMA / 4/27/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- The post shares a simple, ready-to-run vLLM Docker setup for serving the Qwen3.6 27B model locally.
- It uses Lorbus AutoRound INT4 quantization to reduce model size and improve inference efficiency.
- The configuration also applies MTP speculative decoding to accelerate token generation.
- The author reports performance of about 118 tokens per second running on two NVIDIA RTX 3090 GPUs.




