Knowledge Distillation for Large Language Models

arXiv cs.CL / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a resource-efficient framework for compressing large language models via knowledge distillation combined with guided chain-of-thought reinforcement learning, using Qwen 3B as the teacher and Qwen 0.5B as the student.
It applies distillation across English Dolly-15k, Spanish Dolly-15k, and code datasets BugNet and PyTorrent, with English-tuned hyperparameters, achieving 70-91% of the teacher's performance in English, up to 95% in Spanish, and up to 93.5% Rouge-L on code.
For coding tasks, integrating chain-of-thought prompting with Group Relative Policy Optimization on CoT-annotated Codeforces data improves reasoning coherence and solution correctness versus knowledge distillation alone.
Post-training 4-bit weight quantization further reduces memory footprint and inference latency, enabling deployment in resource-constrained settings.

Abstract

We propose a resource-efficient framework for compressing large language models through knowledge distillation, combined with guided chain-of-thought reinforcement learning. Using Qwen 3B as the teacher and Qwen 0.5B as the student, we apply knowledge distillation across English Dolly-15k, Spanish Dolly-15k, and code BugNet and PyTorrent datasets, with hyperparameters tuned in the English setting to optimize student performance. Across tasks, the distilled student retains a substantial portion of the teacher's capability while remaining significantly smaller: 70% to 91% in English, up to 95% in Spanish, and up to 93.5% Rouge-L in code. For coding tasks, integrating chain-of-thought prompting with Group Relative Policy Optimization using CoT-annotated Codeforces data improves reasoning coherence and solution correctness compared to knowledge distillation alone. Post-training 4-bit weight quantization further reduces memory footprint and inference latency. These results show that knowledge distillation combined with chain-of-thought guided reinforcement learning can produce compact, efficient models suitable for deployment in resource-constrained settings.

Automating the Chase: AI for Festival Vendor Compliance

Dev.to

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

Dev.to

500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)

Dev.to

Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?

Dev.to

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both

THE DECODER

Knowledge Distillation for Large Language Models

Key Points

Abstract

Related Articles

Automating the Chase: AI for Festival Vendor Compliance

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)

Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer