Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

arXiv cs.AI / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper argues that distilling large reasoning models for Long-CoT is crucial because full-scale inference is too computationally expensive for practical use.
It criticizes existing curation-based distillation methods for selecting complete reasoning traces after the fact, failing to model collaboration across different teacher models and not performing dynamic exploration.
It proposes CoRD, a collaborative multi-teacher decoding framework that synthesizes Long-CoT step-by-step using predictive perplexity-based scoring together with beam search.
Experiments indicate CoRD generates higher-quality reasoning data and lets student models reach near teacher-level reasoning performance using fewer structured supervision signals, with minimal efficiency overhead.
The approach is shown to generalize to out-of-domain and open-ended scenarios, and the dataset/model are released on GitHub.

Abstract

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at \href{https://github.com/DISL-Lab/CoRD}{https://github.com/DISL-Lab/CoRD}.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Dev.to

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

Dev.to

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Dev.to

From Data to Shelf: AI-Powered Assortment Strategy for Micro CPG

Dev.to

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

From Data to Shelf: AI-Powered Assortment Strategy for Micro CPG

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer