Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks
arXiv cs.CL / 4/6/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces “Rubrics to Tokens (RTT),” a rubric-based reinforcement learning framework aimed at improving LLM alignment for open-domain instruction following tasks.
- It addresses reward sparsity and ambiguity by moving from coarse response-level rewards to fine-grained token-level credit assignment using a Token-Level Relevance Discriminator.
- RTT-GRPO is proposed to unify response-level and token-level advantages in a single optimization framework for the policy model.
- To handle a shift from one-dimensional outcome rewards to a three-dimensional token-level rubric reward space, the authors propose “Intra-sample Token Group Normalization.”
- Reported experiments and benchmarks indicate RTT achieves higher instruction-level and rubric-level accuracy than existing baselines across multiple models.
Related Articles

How Bash Command Safety Analysis Works in AI Systems
Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)
Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App
Dev.to

The Future of Artificial Intelligence in Everyday Life
Dev.to

Teaching Your AI to Read: Automating Document Triage for Investigators
Dev.to