Democratizing the medieval English legal tradition
arXiv cs.CV / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The project digitizes early Anglo-American legal records written in abbreviated medieval Latin by creating a dataset covering 193 medieval criminal and civil cases.
- It trains open-source, end-to-end neural pipelines for line segmentation and handwriting recognition, achieving 79% word accuracy with models like R-Billa and CNN+LSTM (CTC decoding).
- Post-processing improves performance: adding an n-gram language model raises word accuracy to 82%, and using Gemini Pro 3 for error correction increases it to 88%.
- A comparison between CNN+LSTM and TrOCR shows similar word accuracy, but TrOCR has worse character accuracy because it “guesses” more, which can make human verification harder.
- The resulting system is deployed via a public web portal (glyphmachina.com) to broaden access for legal scholars, medievalists, and students.
Related Articles

Black Hat USA
AI Business

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to