KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

arXiv cs.AI / 3/12/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

KernelSkill introduces a multi-agent framework with a dual-level memory architecture that coordinates agents carrying long-term reusable optimization skills and short-term memory to avoid repetitive backtracking.
It replaces implicit heuristics from LLM-based kernel optimization with knowledge-driven expert skills, improving interpretability and efficiency.
On KernelBench Levels 1-3, KernelSkill achieves 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager, outperforming prior baselines.
The work provides an open-source implementation (GitHub) enabling practitioners to apply KernelSkill to GPU kernel optimization.

Abstract

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

Ledge.ai

AIに心を持たせる試みについて

note

AIと創作

note

働くライター｜AI×note

note

まな式AI活用術で、人生が動き出した人たち

note

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Key Points

Abstract

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

AIに心を持たせる試みについて

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開 初のライセンス提供 のサムネイル画像

AIに心を持たせる試みについて

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像