Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

arXiv cs.CL / 4/23/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes “zero-shot cross-programming-language transfer for code RL,” aiming to leverage the universality of coding skills across different programming languages when training data is limited for lower-resource languages.
It finds that, for Llama-3.1, performing RL training on code generation in a source language can fail to improve—and sometimes degrades—performance on other target languages.
To enable more effective RL transfer, the authors hypothesize that RL needs a more generalizable SFT (supervised fine-tuning) initialization before RL.
They introduce “Parallel-SFT,” which mixes together functionally equivalent code implementations written in multiple programming languages during SFT, and show that this improves subsequent RL generalization to unseen languages.
Internal representation analysis suggests Parallel-SFT produces a more functionality-centric latent space, clustering semantically equivalent programs across languages and thereby boosting transferability.

Abstract

Modern language models demonstrate impressive coding capabilities in common programming languages (PLs), such as C++ and Python, but their performance in lower-resource PLs is often limited by training data availability. In principle, however, most programming skills are universal across PLs, so the capability acquired in one PL should transfer to others. In this work, we propose the task of zero-shot cross-programming-language transfer for code RL. We find that, for Llama-3.1, RL training for code generation in a source PL fails to improve, and sometimes even degrades, the performance on other target PLs. To address this, we hypothesize that effective RL transfer requires a generalizable SFT initialization before RL. We thus propose **Parallel-SFT**, an SFT strategy that incorporates "parallel programs" -- functionally equivalent code implemented in multiple PLs -- into the data mixture. We demonstrate that this improves transferability: when we subsequently perform RL on our Parallel-SFT model, we observe better generalization to unseen PLs. Analysis of the model internal representations reveals that Parallel-SFT leads to a more functionality-centric latent space, where equivalent programs across PLs are more tightly clustered, which we hypothesize to contribute to the improved transferability.