Does Weak-to-strong Generalization Happen under Spurious Correlations?

arXiv stat.ML / 3/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

This paper initiates a unified theoretical and algorithmic study of weak-to-strong generalization (W2S) when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on downstream tasks with spurious correlations.
It identifies two sources of spurious correlations due to group imbalance: a weak teacher trained on labeled data with a minority group fraction ηℓ and a group-imbalanced unlabeled set pseudolabeled by the teacher with minority fraction ηu.
Theoretical results show that W2S gain is guaranteed with sufficient pseudolabels when ηu = ηℓ, but may fail when ηu ≠ ηℓ, with the gain diminishing as (ηu − ηℓ)² grows.
Experiments on various spurious-correlation benchmarks corroborate the theory, and the authors propose a simple remedy: retraining the strong student on its high-confidence data subset after W2S fine-tuning, a group-label-free approach that improves performance.

Abstract

We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction

\eta_\ell

, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction

\eta_u

. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when

\eta_u = \eta_\ell

but may fail when

\eta_u e \eta_\ell

, where W2S gain diminishes as

(\eta_u - \eta_\ell)^2

increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Dev.to

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Dev.to

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Dev.to

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

Dev.to

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Dev.to

Does Weak-to-strong Generalization Happen under Spurious Correlations?

Key Points

Abstract

Related Articles

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer