HUKUKBERT: Domain-Specific Language Model for Turkish Law
arXiv cs.CL / 4/7/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- HukukBERT is introduced as a domain-specific language model for Turkish legal NLP, trained on an 18GB cleaned Turkish legal corpus using hybrid Domain-Adaptive Pre-Training (DAPT).
- The paper details a targeted pretraining approach combining multiple masking strategies (Whole-Word, Token Span, Word Span, and Keyword masking) plus a 48K WordPiece tokenizer, and compares the results against both general and existing Turkish legal models.
- On a newly proposed Legal Cloze Test benchmark for Turkish court decisions, HukukBERT reaches 84.40% Top-1 accuracy and sets state-of-the-art performance.
- For downstream structural segmentation of official Turkish court decisions, the model achieves a 92.8% document pass rate, also reporting a new state-of-the-art.
- The authors release HukukBERT with the goal of enabling future Turkish legal NLP research such as named-entity recognition, judgment prediction, and legal document classification.
Related Articles

Black Hat Asia
AI Business

Building EchoKernel: A Voice-Controlled AI Agent That Actually Does Things
Dev.to

A Black-Box Framework for Evaluating Trust in AI Agents
Dev.to
[D] Will Google’s TurboQuant algorithm hurt AI demand for memory chips? [D]
Reddit r/MachineLearning

Friend gave me his old T620. I put a couple Tesla P40s in it today
Reddit r/LocalLLaMA