LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models?

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes LiFT, a longitudinal instruction fine-tuning framework aimed at improving large language models’ ability to reason over temporally ordered text for persistence and change detection.
  • LiFT combines a shared instruction schema across multiple longitudinal NLP tasks with a curriculum that gradually increases temporal difficulty, alongside few-shot structuring and temporal conditioning.
  • The authors evaluate LiFT on five datasets, including tests of cross-dataset generalization across models trained on different temporal granularities.
  • Across multiple model sizes (OLMo 1B/7B, LLaMA-8B, and Qwen-14B), LiFT improves over base-model in-context learning, showing especially strong gains on out-of-distribution data and rare/minority change events.

Abstract

Longitudinal NLP tasks require reasoning over temporally ordered text to detect persistence and change in human behavior and opinions. However, in-context learning with large language models struggles on tasks where models must integrate historical context, track evolving interactions, and handle rare change events. We introduce LiFT, a longitudinal instruction fine-tuning framework that unifies diverse longitudinal modeling tasks under a shared instruction schema. LiFT uses a curriculum that progressively increases temporal difficulty while incorporating few-shot structure and temporal conditioning to encourage effective use of past context. We evaluate LiFT across five datasets. Models trained on longitudinal tasks with different levels of temporal granularity are tested for generalisability on two separate datasets. Across models with different parameter sizes (OLMo (1B/7B), LLaMA-8B, and Qwen-14B), LiFT consistently outperforms base-model ICL, with strong gains on out-of-distribution data and minority change events.