SemRep: Generative Code Representation Learning with Code Transformations

arXiv cs.LG / 3/17/2026

💬 OpinionModels & Research

共有:

Key Points

SemRep proposes using semantics-preserving code transformations as an intermediate representation to guide generative code transformations and downstream instruction-specific edits.
The framework achieves improvements on general code editing and optimization tasks (e.g., GPU kernel optimization) of 6.9% in correctness, 1.1x in performance, 13.9% in generalization, and 6.7% in robustness when trained with the same budget.
SemRep enhances exploration of diverse code transformations and works well with an evolutionary coding agent to discover optimizations that much larger baselines miss while using 25% less inference compute for the same performance.
By decoupling representation learning from end-to-end editing, SemRep provides a more flexible, semantics-guided approach to code transformation.
The approach demonstrates broad applicability across tasks, suggesting improved robustness and generalization in generative code modeling.

Abstract

Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation. Existing approaches treat code transformation as an end-to-end learning task, leaving the construction of the representation needed for semantic reasoning implicit in model weights or relying on rigid compiler-level abstractions. We present SemRep, a framework that improves code transformation through generative code representation learning. Our key insight is to employ the semantics-preserving transformations as the intermediate representation, which serves as both a generative mid-training task and the guidance for subsequent instruction-specific code transformations. Across general code editing and optimization tasks (e.g., GPU kernel optimization), SemRep outperforms the extensively finetuned baselines with strictly the same training budget by 6.9% in correctness, 1.1x in performance, 13.9% in generalization, and 6.7% in robustness. With the improved exploration of diverse code transformations, SemRep is particularly amenable to evolutionary search. Combined with an evolutionary coding agent, SemRep finds optimizations that 685B larger-weight baselines fail to discover while achieving the same performance with 25% less inference compute.

OpenAI just gave up on Sora and its billion-dollar Disney deal

Reddit r/artificial

Paged Attention in Large Language Models LLMs

MarkTechPost

OpenAI is shutting down Sora, its powerful AI video model, app and API

VentureBeat

OpenAI just gave up on Sora and its billion-dollar Disney deal

The Verge

New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B

Reddit r/LocalLLaMA

SemRep: Generative Code Representation Learning with Code Transformations

Key Points

Abstract

Related Articles

OpenAI just gave up on Sora and its billion-dollar Disney deal

Paged Attention in Large Language Models LLMs

OpenAI is shutting down Sora, its powerful AI video model, app and API

OpenAI just gave up on Sora and its billion-dollar Disney deal

New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer