The Russian Legislative Corpus

arXiv cs.CL / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The article introduces a large, comprehensive corpus of Russian legislation covering 1991 to 2025, totaling 304,382 legal texts and about 194.4 million tokens.
It provides two dataset versions: a basic release with simple metadata and a detailed release that includes original texts plus Universal Dependencies CoNLL-U conversions.
The detailed version enriches the data with linguistic annotations such as parts of speech, morphological features, and syntactic dependency relations.
The corpus is positioned as a resource for working with Russian legal language in downstream research and development tasks requiring structured, annotated text.

Abstract

We present a comprehensive corpus of Russian primary and secondary legislation adopted between 1991 and 2025, comprising 304,382 texts (194,425,905 tokens). The corpus is available in two versions: the basic version contains texts with simple metadata, while the detailed version includes both the original texts and their equivalents converted to the Universal Dependencies CoNLL-U format, annotated with parts of speech, morphological features, and syntactic dependencies.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

An API testing tool built specifically for AI agent loops

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Automatic Error Recovery in AI Agent Networks

Dev.to

The Russian Legislative Corpus

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

An API testing tool built specifically for AI agent loops

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Automatic Error Recovery in AI Agent Networks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer