Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
arXiv cs.CL / 3/13/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- Bielik-Minitron-7B is a compressed 7.35B parameter version of Bielik-11B-v3.0 optimized for European languages (including Polish) using a two-stage compression approach inspired by the NVIDIA Minitron method.
- The compression reduces parameters by 33.4%, from 11.04B to 7.35B, using structured hybrid pruning with NVIDIA Model Optimizer and logit-based distillation with NVIDIA NeMo.
- After distillation, an alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning with GRPO was applied to recover model quality.
- The final model reportedly recovers about 90% of the baseline performance while offering up to 50% faster inference, enabling cheaper deployment for less-represented languages.
- This work illustrates a practical pathway to deploy efficient language models for European languages, preserving quality while reducing inference costs, supported by NVIDIA tooling.




