LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning
arXiv cs.LG / 4/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper reports that, when LLMs process long contexts, query and key vectors show high-magnitude activation patterns that appear important for effective long-context reasoning.
- It introduces LongAct, a long-context reinforcement learning strategy that replaces uniform parameter updates with saliency-guided sparse updates targeting weights tied to these salient activations.
- LongAct delivers about an 8% improvement on LongBench v2 and improves generalization on the RULER benchmark.
- The approach is described as broadly compatible, providing performance gains across multiple RL algorithms (including GRPO and DAPO) and supported by ablation studies that highlight the role of salient features.
- The work reframes long-context RL training by leveraging intrinsic representation characteristics rather than relying primarily on reward engineering or data synthesis.


![[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Flu4b6ttuhur71z5gemm0.png&w=3840&q=75)
