KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
arXiv cs.AI / 3/12/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- KernelSkill introduces a multi-agent framework with a dual-level memory architecture that coordinates agents carrying long-term reusable optimization skills and short-term memory to avoid repetitive backtracking.
- It replaces implicit heuristics from LLM-based kernel optimization with knowledge-driven expert skills, improving interpretability and efficiency.
- On KernelBench Levels 1-3, KernelSkill achieves 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager, outperforming prior baselines.
- The work provides an open-source implementation (GitHub) enabling practitioners to apply KernelSkill to GPU kernel optimization.



