Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

arXiv stat.ML / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies when auxiliary data improves generalization in transfer learning, focusing on two linear benchmarks: ordinary least squares regression and under-parameterized linear neural networks with shared representations.
For linear regression, it derives exact closed-form expressions for expected generalization error (via bias-variance decomposition), including necessary and sufficient conditions on task/task-related auxiliary data that determine whether transfer helps.
It also computes globally optimal auxiliary task weights through solvable optimization programs and provides consistency guarantees for empirical estimates of these quantities.
For linear neural networks, the authors derive a non-asymptotic expectation bound and provide the first non-vacuous sufficient condition for beneficial auxiliary learning when representation width is limited, along with guidance for task-weight curation.
The theoretical results rely on a new column-wise low-rank perturbation bound for random matrices that preserves column-level structure, and the findings are supported by controlled synthetic experiments.

Abstract

In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width

q \leq K

, where

K

is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

BYOK is not just a pricing model: why it changes AI product trust

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK

Dev.to

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization

Dev.to

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

Key Points

Abstract

Related Articles

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

BYOK is not just a pricing model: why it changes AI product trust

AI Citation Registries and Identity Persistence Across Records

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer