AI Navigate

Temporal Text Classification with Large Language Models

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper conducts a systematic evaluation of temporal text classification (TTC) using leading proprietary LLMs (Claude 3.5, GPT-4o, Gemini 1.5) and open-source LLMs (LLaMA 3.2, Gemma 2, Mistral, Nemotron 4) across three historical corpora (two English, one Portuguese) to assess zero-shot, few-shot prompting, and fine-tuning settings.
  • Proprietary models show strong TTC performance, particularly with few-shot prompting.
  • Open-source models improve with fine-tuning but still do not match the performance of proprietary LLMs.
  • The study highlights implications for prompt design, model selection, and future research in dating historical texts.

Abstract

Languages change over time. Computational models can be trained to recognize such changes enabling them to estimate the publication date of texts. Despite recent advancements in Large Language Models (LLMs), their performance on automatic dating of texts, also known as Temporal Text Classification (TTC), has not been explored. This study provides the first systematic evaluation of leading proprietary (Claude 3.5, GPT-4o, Gemini 1.5) and open-source (LLaMA 3.2, Gemma 2, Mistral, Nemotron 4) LLMs on TTC using three historical corpora, two in English and one in Portuguese. We test zero-shot and few-shot prompting, and fine-tuning settings. Our results indicate that proprietary models perform well, especially with few-shot prompting. They also indicate that fine-tuning substantially improves open-source models but that they still fail to match the performance delivered by proprietary LLMs.