Release v5.6.0

Transformers（HuggingFace）Releases / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Read original →

共有:

Key Points

OpenAI Privacy Filter has been added as a bidirectional token-classification model for detecting and masking PII, designed for fast, context-aware, on-premises text sanitization with an 8-category label set.
The Privacy Filter model first labels tokens in a single forward pass and then forms coherent privacy spans using a constrained Viterbi decoding approach.
QianfanOCR (Qianfan-OCR) has been introduced as a 4B-parameter end-to-end document intelligence model from Baidu that converts images to text directly without multi-stage OCR pipelines.
QianfanOCR supports prompt-driven document tasks such as structured parsing, table extraction, chart understanding, document QA, and key information extraction in one unified model, leveraging a “Layout-as-Thought” intermediate structured layout generation capability.

Release v5.6.0

New Model additions

OpenAI Privacy Filter

OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. The model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure, predicting probability distributions over 8 privacy-related output categories for each input token.

Links: Documentation

[Privacy Filter] Add model (#45580) by @vasqu in #45580

QianfanOCR

Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by Baidu that performs direct image-to-text conversion without traditional multi-stage OCR pipelines. It supports a broad range of prompt-driven tasks including structured document parsing, table extraction, chart understanding, document question answering, and key information extraction all within one unified model. The model features a unique "Layout-as-Thought" capability that generates structured layout representations before producing final outputs, making it particularly effective for complex documents with mixed element types.

Links: Documentation | Paper

add Qianfan-OCR model definition (#45280) by @marvinzh in #45280

SAM3-LiteText

SAM3-LiteText is a lightweight variant of SAM3 that replaces the heavy SAM3 text encoder (353M parameters) with a compact MobileCLIP-based text encoder optimized through knowledge distillation, while keeping the SAM3 ViT-H image encoder intact. This reduces text encoder parameters by up to 88% while maintaining segmentation performance comparable to the original model. The model enables efficient vision-language segmentation by addressing the redundancy found in text prompting for segmentation tasks.

Links: Documentation | Paper

Add SAM3-LiteText (#44320) by @NielsRogge in #44320

SLANet

SLANet and SLANet_plus are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The model improves accuracy and inference speed by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information. SLANet was developed by Baidu PaddlePaddle Vision Team as part of their table structure recognition solutions.

Links: Documentation

[Model] Add SLANet Model Support (#45532) by @zhang-prog in #45532

Breaking changes

The internal rotary_fn is no longer registered as a hidden kernel function, so any code referencing self.rotary_fn(...) within an Attention module will break and must be updated to call the function directly instead.

🚨 [Kernels] Fix kernel function registration (#45420) by @vasqu

Serve

The transformers serve command received several enhancements, including a new /v1/completions endpoint for legacy text completion, multimodal support for audio and video inputs, improved tool-calling via parse_response, proper forwarding of tool_calls/tool_call_id fields, a 400 error on model mismatch when the server is pinned to a specific model, and fixes for the response API. Documentation was also updated to cover new serving options such as --compile and --model-timeout.

Add /v1/completions endpoint (OpenAI legacy completions API) to transformers serve (#44558) by @rain-1 in [#44558]
Updated the image cache for Paddle models according to the latest API (#45562) by @zhang-prog in [#45562]
Raise 400 on model mismatch when transformers serve is pinned (#45443) by @qgallouedec in [#45443]
[serve] Update tool call to switch to parse_response (#45485) by @SunMarc in [#45485]
Fix response api support (#45463) by @SunMarc in [#45463]
[serve] Forward tool_calls/tool_call_id in processor inputs (#45418) by @qgallouedec in [#45418]
refactor(qa): extend extras so ty can run on server modules (#45456) by @tarekziade in [#45456]
Multimodal serve support (#45220) by @SunMarc in [#45220]
[docs] transformers serve (#45174) by @stevhliu in [#45174]

Vision

Several vision-related bug fixes were applied in this release, including correcting Qwen2.5-VL temporal RoPE scaling for still images, fixing missing/mismatched image processor backends for Emu3 and BLIP, resolving modular image processor class duplication, and preventing accelerate from incorrectly splitting vision encoders in PeVideo/PeAudioVideo models. Image loading performance was also improved by leveraging torchvision's native decode_image in the torchvision backend, yielding up to ~17% speedup over PIL-based loading.

Revert "Fix: modular image processors (#45492)" (#45531) by @tarekziade in [#45531]
Fix: modular image processors (#45492) by @zucchini-nlp in [#45492]
fix: prevent accelerate from splitting vision encoder by setting no… (#43047) by @ in [#43047]
Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by @Kash6 in [#45330]
Use torchvision decode_image to load images in the torchvision backend (#45195) by @yonigozlan in [#45195]
Fix missing image processors backends (#45165) by @zucchini-nlp in [#45165]

Parallelization

Fixed several bugs affecting distributed training, including silently wrong results or NaN loss with Expert Parallelism, NaN weights on non-rank-0 FSDP processes, and a resize failure in PP-DocLayoutV3; additionally added support for loading adapters with Tensor Parallelism, added MoE to the Gemma4 TP plan, and published documentation for TP training.

Fix EP: RouterParallel shape, tp_plan property, grouped_mm sentinels (#45473) by @AmineDiro in [#45473]
Fix NaN weights on non-rank-0 FSDP processes (#45050) by @albertvillanova in [#45050]
Load adapter with TP (#45155) by @michaelbenayoun in [#45155]
[docs] tp training (#44613) by @stevhliu in [#44613]
Fix resize failure caused by zero-sized masks in PP-DocLayoutV3 (#45281) by @zhang-prog in [#45281]
Add MoE to Gemma4 TP plan (#45219) by @sywangyi in [#45219]

Tokenization

Fixed a docstring typo in streamer classes, resolved a Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError, and patched a streaming generation crash for Qwen3VLProcessor caused by incorrect _tokenizer attribute access. Additional housekeeping included moving the GPT-SW3 instruct tokenizer to an internal testing repo and fixing a global state leak in the tokenizer registry during tests.

[Doc] Fix 'tokenized' -> 'tokenizer' typo in streamer docstrings (#45508) by @avasis-ai in [#45508]
Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex AttributeError (#45359) by @ArthurZucker in [#45359]
fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation (#45368) by @sharziki in [#45368]
[Tokenizers] Move gpt sw3 tokenizer out (#45404) by @vasqu in [#45404]
fix: leak in tokenizer registry for test_processors (#45318) by @tarekziade in [#45318]

Cache

Cache handling was improved for Gemma4 and Gemma3n models by dissociating KV state sharing from the Cache class, ensuring KV states are always shared regardless of whether a Cache is used. Additionally, the image cache for Paddle models was updated to align with the latest API.

Align gemma3n cache sharing to gemma4 (#45489) by @Cyrilvallez in [#45489]
remove cache file from tree (#45392) by @tarekziade in [#45392]
[gemma4] Dissociate kv states sharing from the Cache (#45312) by @Cyrilvallez in [#45312]

Audio

Audio models gained vLLM compatibility through targeted fixes across several model implementations, while reliability improvements were also made including exponential back-off retries for audio file downloads, a crash fix in the text-to-speech pipeline when generation configs contain None values, and corrected test failures for Kyutai Speech-To-Text.

feat[vLLM × v5]: Add vLLM compatibility for audio models (#45326) by @harshaljanjani in [#45326]
http retries on audio file downloads (#45126) by @tarekziade in [#45126]
fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI (#44695) by @harshaljanjani in [#44695]
Fix text-to-speech pipeline crash when generation config contains None values (#45107) by @jiqing-feng in [#45107]

Bugfixes and improvements

[Privacy Filter] Add model (#45580) by @vasqu in [#45580]
Add ForSequenceClassification heads for the OLMo family (#45551) by @earino in [#45551]
Add IndexCache support for GLM5 DSA (#45424) by @louzongzhi in [#45424]
Fix redundant logic in video processing SmolVLM (#45272) by @yonigozlan in [#45272]
Fix typos (#45574) by @vasqu in [#45574]
[Model] Add SLANet Model Support (#45532) by @zhang-prog in [#45532]
refactor(Dots1): drop Dots1MoE override to pass (inherits from DSV3 MoE) (#45572) by @casinca in [#45572]
perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models (#45555) by @casinca in [#45555]
Gemma4 training with text-only samples (#45454) by @zucchini-nlp in [#45454]
[nemotron_h] Add support for MLP mixers (#44763) by @xenova in [#44763]
add expert parallelism for gemma-4-26B-A4B-it (#45279) by @sywangyi in [#45279]
Add full GGUF loading support for GPT‑OSS (fixes #43366, supersedes #43757) latest (#45506) by @sirzechs66 in [#45506]
Update Gemma4 weight conversion script (#45328) by @RyanMullins in [#45328]
Move some conversion mappings to PrefixChange (#45567) by @Cyrilvallez in [#45567]
fix table update versions (#45544) by @tarekziade in [#45544]
Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection (#45547) by @rtrompier in [#45547]
fix(DSV3): parity between native DeepseekV3MoE and remote official implementation (#45441) by @casinca in [#45441]
[modular] Fix modular logic broken in #45045 (#45539) by @Cyrilvallez in [#45539]
Fix: propagate quantization_config to text sub-config for composite models in AutoModelForCausalLM (#45494) by @lvliang-intel in [#45494]
T5Gemma2: fix prepare_decoder_input_ids_from_labels (#45516) by @Tokarak in [#45516]
[Trainer] Add ddp_static_graph option (#45519) by @KeitaW in [#45519]
Add dtype config options for Four Over Six (#45367) by @jackcook in [#45367]
[Sam3LiteText] Remove unnecessary modules/configs (#45535) by @yonigozlan in [#45535]
Fix conditional check for float formatting (#44425) by @qgallouedec in [#44425]
Fix AMD CI: rebuild torchvision with libjpeg + refresh expectations (#45533) by @Abdennacer-Badaoui in [#45533]
Reapply modular to examples (#45527) by @Cyrilvallez in [#45527]
qa: re-run modular converter when the script itself is modified (#45528) by @tarekziade in [#45528]
[GGUF] Reduce peak RAM usage by casting dequantized tensors early during load (#45386) by @UsamaKenway in [#45386]
Fix CSM TextToAudioPipeline missing <bos> token (#45525) by @jiqing-feng in [#45525]
[Conversion Mapping] Small fixups (#45483) by @vasqu in [#45483]
fix: return empty tuple from import_protobuf_decode_error when protobuf is unavailable (#45486) by @jw9603 in [#45486]
throw error when conversion required (#45078) by @itazap in [#45078]
chore: bump doc-builder SHA for PR upload workflow (#45450) by @rtrompier in [#45450]
xpu output align with cuda in test case (#45526) by @sywangyi in [#45526]
chore(qa): split out mlinter (#45475) by @tarekziade in [#45475]
[loading] Clean way to add/remove full parts in checkpoint names (#45448) by @Cyrilvallez in [#45448]
Fix Zamba2MambaMixer ignoring use_mamba_kernels=False (#44853) by @sergiopaniego in [#44853]
revert sha commit pointing to main for transformers_amd_ci_ workflows (#45495) by @paulinebm in [#45495]
Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model (#45402) by @saslifat-gif in [#45402]
Remove redundant condition checks in get_image_size method (#45461) by @JiauZhang in [#45461]
Add check-auto in repo-consistency and fix sorting (#45481) by @zucchini-nlp in [#45481]
Fix typos in src/transformers/utils/output_capturing.py (#45269) by @ryota-komatsu in [#45269]
typing: rule 15 - checks for tie_word_embeddings presence (#44988) by @tarekziade in [#44988]
[CB] Fix capture of max_seqlen (#45323) by @remi-or in [#45323]
Minor update (#45484) by @ydshieh in [#45484]
Add Neuron to auto-compile hardware list (#44757) by @dacorvo in [#44757]
Allow loading Qwen Thinker 'base' models without generative head (#45457) by @tomaarsen in [#45457]
[fix] Always early return for non-Mistral models in _patch_mistral_regex (#45444) by @tomaarsen in [#45444]
Fix spurious position_ids warnings for at least 40 architectures (#45437) by @tomaarsen in [#45437]
[fix] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once (

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/23DailyView insight →

No Free Lunch Theorem — Deep Dive + Problem: Reverse Bits

Dev.to

Salesforce Headless 360: Run Your CRM Without a Browser

Dev.to

RAG Systems in Production: Building Enterprise Knowledge Search

Dev.to

We Built a 31-Agent AI Team That Hires Itself, Critiques Itself, and Dreams

Dev.to

gpt-image-2 API: ship 2K AI images in Next.js for $0.21 (2026)

Dev.to

Release v5.6.0

Key Points

Release v5.6.0

New Model additions

OpenAI Privacy Filter

QianfanOCR

SAM3-LiteText

SLANet

Breaking changes

Serve

Vision

Parallelization

Tokenization

Cache

Audio

Bugfixes and improvements

💡 Insights using this article

Related Articles

No Free Lunch Theorem — Deep Dive + Problem: Reverse Bits

Salesforce Headless 360: Run Your CRM Without a Browser

RAG Systems in Production: Building Enterprise Knowledge Search

We Built a 31-Agent AI Team That Hires Itself, Critiques Itself, and Dreams

gpt-image-2 API: ship 2K AI images in Next.js for $0.21 (2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer