The Russian Legislative Corpus
arXiv cs.CL / 4/29/2026
💬 OpinionDeveloper Stack & InfrastructureModels & Research
Key Points
- The article introduces a large, comprehensive corpus of Russian legislation covering 1991 to 2025, totaling 304,382 legal texts and about 194.4 million tokens.
- It provides two dataset versions: a basic release with simple metadata and a detailed release that includes original texts plus Universal Dependencies CoNLL-U conversions.
- The detailed version enriches the data with linguistic annotations such as parts of speech, morphological features, and syntactic dependency relations.
- The corpus is positioned as a resource for working with Russian legal language in downstream research and development tasks requiring structured, annotated text.
Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team
Dev.to

An API testing tool built specifically for AI agent loops
Dev.to
IK_LLAMA now supports Qwen3.5 MTP Support :O
Reddit r/LocalLLaMA
OpenAI models, Codex, and Managed Agents come to AWS
Dev.to

Automatic Error Recovery in AI Agent Networks
Dev.to