Long-Context Encoder Models for Polish Language Understanding
arXiv cs.CL / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a Polish encoder-only model capable of processing sequences of up to 8192 tokens, addressing the short-context limitation of traditional BERT-like encoders.
- It uses a two-stage training procedure—positional embedding adaptation followed by full parameter continuous pre-training—along with compressed variants via knowledge distillation to balance performance and efficiency.
- Evaluations across 25 tasks, including KLEJ and FinBench, show the model achieves the best average performance among Polish and multilingual models on long-context tasks while preserving short-text quality.
- The work, released as arXiv:2603.12191v1 under the 'new' announce type, highlights meaningful progress for long-document understanding in Polish and multilingual NLP.




