AI Navigate

v0.17.1

vLLM Releases / 3/11/2026

📰 NewsDeveloper Stack & Infrastructure

Key Points

  • This release, v0.17.1, is a patch update to the previous v0.17.0 version, focusing on fixing multiple issues identified in the codebase.
  • It addresses bugs related to the passing of activation_type in trtllm fused MoE implementations for NVFP4 and FP8 precisions.
  • The patch restores support for nongated fused MoE Triton setups and re-enables EP support for the trtllm MoE FP8 backend.
  • Additional fixes include GPU cache management improvements for Mamba and Qwen3.5 models, and optimizations in indexer handling for DSV3.2 and MTP components.
  • These targeted fixes enhance stability and performance for users leveraging trtllm fused MoE and related machine learning infrastructure components.

This is a patch release on top of v0.17.0 to address a few issues:

  • Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017)
  • Fix/resupport nongated fused moe triton (#36412)
  • Re-enable EP for trtllm MoE FP8 backend (#36494)
  • [Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219)
  • Fix TRTLLM Block FP8 MoE Monolithic (#36296)
  • [DSV3.2][MTP] Optimize Indexer MTP handling (#36723)