GASP: Guided Asymmetric Self-Play For Coding LLMs

arXiv cs.LG / 3/18/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

GASP introduces Guided Asymmetric Self-Play, a grounding mechanism for self-play in coding LLMs that uses real-data goalpost questions to steer exploration.
During training, a teacher first generates an easier variant of a hard question and then a harder variant, gradually closing the gap to the goalpost.
Compared with unguided self-play, GASP achieves a 2.5 percentage point improvement in pass@20 on LiveCodeBench and enables solving hard goalpost questions that baselines cannot reach.
By grounding the curriculum in real tasks rather than pure difficulty, the approach addresses the problem of uninformative hard problems in prior asymmetric self-play.
The paper suggests that such grounded curricula can lead to more efficient post-training data generation for coding LLMs and better handling of hard problem distributions.

Abstract

Asymmetric self-play has emerged as a promising paradigm for post-training large language models, where a teacher continually generates questions for a student to solve at the edge of the student's learnability. Although these methods promise open-ended data generation bootstrapped from no human data, they suffer from one major problem: not all problems that are hard to solve are interesting or informative to improve the overall capabilities of the model. Current asymmetric self-play methods are goal-agnostic with no real grounding. We propose Guided Asymmetric Self-Play (GASP), where grounding is provided by real-data goalpost questions that are identified to pose a hard exploration challenge to the model. During self-play, the teacher first generates an easier variant of a hard question, and then a harder variant of that easier question, with the goal of gradually closing the gap to the goalpost throughout training. Doing so, we improve pass@20 on LiveCodeBench (LCB) by 2.5% over unguided asymmetric self-play, and through the curriculum constructed by the teacher, we manage to solve hard goalpost questions that remain out of reach for all baselines.

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Dev.to

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

A supervisor or "manager" Al agent is the wrong way to control Al

Reddit r/artificial

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Reddit r/LocalLLaMA

GASP: Guided Asymmetric Self-Play For Coding LLMs

Key Points

Abstract

Related Articles

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

A supervisor or "manager" Al agent is the wrong way to control Al

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer