Patch release: v5.5.2

Transformers（HuggingFace）Releases / 4/9/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Hugging Face Transformers’ patch release v5.5.2 focuses on optimizing Gemma4 inference performance and reliability fixes.
The update corrects inference issues related to using `use_cache=False`, caused by key/value (KV) state sharing across layers.
It adjusts model weight conversion/serialization mappings to prevent inconsistencies when models convert their weight names, including fixes for VLMs.
The release includes multiple PRs adding Mixture-of-Experts (MoE) to the Gemma4 tensor-parallel (TP) planning, dissociating KV state sharing from the cache, and removing shared weights while skipping them during loading.

Small patch dedicated to optimizing gemma4, fixing inference with use_cache=False due to k/v states sharing between layers, as well as conversion mappings for some models that would inconsistently serialize their weight names. It contains the following PRs:

Add MoE to Gemma4 TP plan (#45219) by @sywangyi and @Cyrilvallez
[gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez
[gemma4] Remove all shared weights, and silently skip them during loading (#45336) by @Cyrilvallez
Fix conversion mappings for vlms (#45340) by @Cyrilvallez