LLM REgression with a Latent Iterative State Head

arXiv cs.CL / 4/3/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces RELISH, a lightweight architecture for LLM-based text regression that predicts scalar targets directly rather than generating text-form numeric outputs or combining multiple generations.
RELISH uses a learned latent iterative state refined via cross-attention over token-level representations, ending with a linear regressor to produce the final point estimate.
Experiments across five datasets, four LLM backbones, and two training regimes show RELISH consistently outperforms prior baselines across multiple LLM regression families, including autoregressive decoding and existing predictive-head approaches.
The approach is highly parameter-efficient, adding only about 3.4–3.7M trainable parameters on top of frozen backbones (roughly 0.01–0.04%), which is far smaller than LoRA-style methods reported as adding 0.26–0.42% overhead.
Overall, RELISH targets improved accuracy for regression tasks while keeping fine-tuning cost low by training only a compact head/state module.

Abstract

We present RELISH (REgression with a Latent Iterative State Head), a novel, lightweight architecture designed for text regression with large language models. Rather than decoding numeric targets as text or aggregating multiple generated outputs, RELISH predicts scalar values directly from frozen LLM representations by iteratively refining a learned latent state through cross-attention over token-level representations, and then mapping the final state to a point estimate with a linear regressor. Across five datasets, four LLM backbones, and two LLM training regimes, RELISH consistently outperforms prior baselines from all three major LLM regression families, including autoregressive decoding, regression-aware inference, and existing predictive head methods. Despite these gains, RELISH remains highly parameter-efficient, requiring only 3.4-3.7M trainable parameters across frozen LLM backbones (only 0.01-0.04% additional overhead), far less than LoRA-based alternatives that grow with model size (0.26-0.42%).

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

LLM REgression with a Latent Iterative State Head

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer