Execution-Grounded Credit Assignment for GRPO in Code Generation
arXiv cs.LG / 3/18/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The authors address coarse credit assignment in critic-free RL for code generation by highlighting that errors can be localized rather than global.
- They propose Execution-Grounded Credit Assignment (EGCA), which uses execution traces to localize GRPO updates to the token span corresponding to the earliest semantic divergence.
- EGCA runs the candidate and a canonical reference solution under identical instrumentation to determine where the failure occurs and masks downstream tokens for targeted credit.
- It is a drop-in modification requiring no critic, auxiliary loss, or learned verifier, and yields substantial accuracy gains on HumanEval and MBPP (82.1% pass@1 and 68.9%, respectively) with ~18% overhead.
- The approach suggests a general method to improve RL-based code generation by grounding credit in execution traces rather than global outcomes.




