Towards the AI Historian: Agentic Information Extraction from Primary Sources

arXiv cs.AI / 4/7/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The paper introduces Chronos, an “AI Historian” under development aimed at enabling historians to perform agentic information extraction from primary sources where current AI solutions are limited.
  • The first Chronos module supports converting image scans of primary sources into structured data via natural-language interactions, rather than relying on a single fixed VLM-powered extraction pipeline.
  • Chronos is designed to let historians adapt extraction workflows to heterogeneous document corpora, evaluate model performance on specific tasks, and iteratively refine workflows through interaction with the agent.
  • The module is described as open-source and positioned for historians to use on their own collections, supporting practical experimentation and validation.
  • Overall, the work frames historical research as an area needing tailor-made AI tooling and proposes an agentic, human-in-the-loop approach to extraction and workflow control.

Abstract

AI is supporting, accelerating, and automating scientific discovery across a diverse set of fields. However, AI adoption in historical research remains limited due to the lack of solutions designed for historians. In this technical progress report, we introduce the first module of Chronos, an AI Historian under development. This module enables historians to convert image scans of primary sources into data through natural-language interactions. Rather than imposing a fixed extraction pipeline powered by a vision-language model (VLM), it allows historians to adapt workflows for heterogeneous source corpora, evaluate the performance of AI models on specific tasks, and iteratively refine workflows through natural-language interaction with the Chronos agent. The module is open-source and ready to be used by historical researchers on their own sources.