LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models
arXiv cs.AI / 3/23/2026
💬 OpinionModels & Research
Key Points
- LARFT integrates length-oriented reinforcement learning with hindsight length awareness to align a model's length cognition with its generation actions.
- It converts on-policy data into hindsight self-awareness tasks, enabling the model to identify the actual length of its own generated text.
- Across four base models, LARFT outperforms baselines, achieving an average improvement of +20.92 points on three length instruction-following benchmarks and only a modest -1.45 point decline on four general capability benchmarks.
- The results indicate improved precision and reliability in satisfying length constraints without substantially sacrificing general capabilities.
Related Articles
Data Augmentation Using GANs
Dev.to
Zero Shot Deformation Reconstruction for Soft Robots Using a Flexible Sensor Array and Cage Based 3D Gaussian Modeling
arXiv cs.RO
Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation
arXiv cs.RO
ReMAP-DP: Reprojected Multi-view Aligned PointMaps for Diffusion Policy
arXiv cs.RO
AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning
arXiv cs.RO