LLM REgression with a Latent Iterative State Head

arXiv cs.CL / 4/3/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces RELISH, a lightweight architecture for LLM-based text regression that predicts scalar targets directly rather than generating text-form numeric outputs or combining multiple generations.
  • RELISH uses a learned latent iterative state refined via cross-attention over token-level representations, ending with a linear regressor to produce the final point estimate.
  • Experiments across five datasets, four LLM backbones, and two training regimes show RELISH consistently outperforms prior baselines across multiple LLM regression families, including autoregressive decoding and existing predictive-head approaches.
  • The approach is highly parameter-efficient, adding only about 3.4–3.7M trainable parameters on top of frozen backbones (roughly 0.01–0.04%), which is far smaller than LoRA-style methods reported as adding 0.26–0.42% overhead.
  • Overall, RELISH targets improved accuracy for regression tasks while keeping fine-tuning cost low by training only a compact head/state module.

Abstract

We present RELISH (REgression with a Latent Iterative State Head), a novel, lightweight architecture designed for text regression with large language models. Rather than decoding numeric targets as text or aggregating multiple generated outputs, RELISH predicts scalar values directly from frozen LLM representations by iteratively refining a learned latent state through cross-attention over token-level representations, and then mapping the final state to a point estimate with a linear regressor. Across five datasets, four LLM backbones, and two LLM training regimes, RELISH consistently outperforms prior baselines from all three major LLM regression families, including autoregressive decoding, regression-aware inference, and existing predictive head methods. Despite these gains, RELISH remains highly parameter-efficient, requiring only 3.4-3.7M trainable parameters across frozen LLM backbones (only 0.01-0.04% additional overhead), far less than LoRA-based alternatives that grow with model size (0.26-0.42%).