Simple to use vLLM Docker Container for Qwen3.6 27b with Lorbus AutoRound INT4 quant and MTP speculative decoding - 118 tokens/second on 2x 3090s

Reddit r/LocalLLaMA / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The post shares a simple, ready-to-run vLLM Docker setup for serving the Qwen3.6 27B model locally.
  • It uses Lorbus AutoRound INT4 quantization to reduce model size and improve inference efficiency.
  • The configuration also applies MTP speculative decoding to accelerate token generation.
  • The author reports performance of about 118 tokens per second running on two NVIDIA RTX 3090 GPUs.