From Legal Text to Executable Decision Models: Evaluating Structured Representations for Legal Decision Model Generation

arXiv cs.CL / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The study investigates whether intermediate structured representations can help LLMs generate executable legal decision models from legal text, addressing the high cost of manual coding and evaluation in legal informatics.
Using a real-world dataset linking Dutch Environment and Planning Act text to production decision models powering the Omgevingsloket platform, the authors compare four enrichment strategies (raw text, semantic role labels, I/O constraints, and both together).
The strongest gains come from adding input/output constraints, improving structural similarity by about 37–54% over the baseline, while semantic role labels yield only modest improvements.
On functional (outcome) evaluation, generated models match the gold standard on 51–53% of pre-configured test scenarios, and the generated models tend to be smaller and simpler.
Structural similarity and outcome equivalence are found to be complementary—high structural overlap does not necessarily imply correct behavior, and behavioral correctness does not always follow from structural similarity—and the authors release the dataset (95 models) and full experimental code for reproducibility.

Abstract

Transforming legal text into executable decision logic is a longstanding challenge in legal informatics. With the rise of LLMs, this task has gained renewed interest, but remains challenging due to requiring extensive manual coding and evaluation. We use a unique real-world dataset that pairs production-grade decision models with legal text from the Dutch Environment and Planning Act. These models power the Omgevingsloket government platform, where citizens check permit requirements for environmental activities. We study whether intermediate structured representations can improve LLM-based generation of executable decision models from legal text. We compare four input conditions: raw legal text, text enriched with semantic role labels, text enriched with input and output constraints, and text enriched with both. We evaluate along two dimensions: structural evaluation, through similarity to gold decision models with graph kernels and graphs' descriptive statistics, and outcome evaluation, through functional equivalence by executing models on pre-configured test scenarios. Our findings show that I/O constraints provide the dominant improvement (+37-54% similarity over baseline), while semantic role labels show modest improvements. Outcome evaluation shows that generated models match the gold standard on 51-53% of test scenarios, even though generated models are typically smaller and simpler. We find LLMs eliminate redundant pass-through logic that comprises up to 45-55% of nodes. Importantly, structural similarity and outcome equivalence are complementary: structural similarity does not guarantee outcome equivalence, and vice versa. To facilitate reproducibility, we publicly release our dataset of 95 production decision models with associated legal text and all experimental code.

A practical guide to getting comfortable with AI coding tools

Dev.to

Competitive Map: 10 AI Agent Platforms vs AgentHansa

Dev.to

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

From Legal Text to Executable Decision Models: Evaluating Structured Representations for Legal Decision Model Generation

Key Points

Abstract

Related Articles

A practical guide to getting comfortable with AI coding tools

Competitive Map: 10 AI Agent Platforms vs AgentHansa

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer