Learning to Think from Multiple Thinkers

arXiv stat.ML / 4/28/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how learning behaves when Chain-of-Thought (CoT) supervision is provided by multiple “thinkers” who produce correct but potentially systematically different reasoning or solution traces.
It identifies function classes that are easy to learn with CoT from a single thinker, yet can be difficult to learn using only end-result supervision (without CoT).
Under cryptographic assumptions, the authors show that learning can remain hard even when CoT is supplied by two or a few different thinkers in passive data-collection settings.
The paper also proposes a computationally efficient active learning method that can learn using a small amount of CoT data per thinker, with sample complexity that depends on log factors of the target accuracy and additional passive end-result data.
Overall, the work provides both negative (hardness under assumptions) and positive (active-learning algorithm) results for multi-thinker CoT supervision.

Abstract

We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy

\varepsilon

, a moderate number of thinkers that scales as

\log \frac{1}{\varepsilon}\log \log \frac{1}{\varepsilon}

, and sufficient passive end-result data that scales as

\frac{1}{\varepsilon}\cdot poly\log\frac{1}{\varepsilon}

Write a 1,200-word blog post: "What is Generative Engine Optimization (GEO) and why SEO teams need it now"

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Most People Use AI Like Google. That's Why It Sucks.

Dev.to

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Dev.to

Tian AI vs ChatGPT: Why Local AI Is the Future of Privacy

Dev.to

Learning to Think from Multiple Thinkers

Key Points

Abstract

Related Articles

Write a 1,200-word blog post: "What is Generative Engine Optimization (GEO) and why SEO teams need it now"

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Most People Use AI Like Google. That's Why It Sucks.

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Tian AI vs ChatGPT: Why Local AI Is the Future of Privacy

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer