Model-agnostic information transfer and fusion for classification with label noise

arXiv stat.ML / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tackles learning from datasets with label noise by using a common “coarse noisy labels + small clean expert-labeled set” paradigm, framing it as an information transfer and fusion problem.
It argues that existing statistical transfer learning methods break down because of a substantial distribution shift between noisy and clean data and because they assume parametric models that are a poor fit for complex inputs like images.
The authors propose a model-agnostic, nonparametric classification framework that can work across a broad class of classifiers rather than being tied to specific model architectures.
The method uses the small clean dataset to “purify” the larger noisy dataset while explicitly handling the remaining ambiguous samples, backed by a rigorous statistical theory.
Experiments include simulations and a medical imaging case study for pneumonia diagnosis, showing practical effectiveness of the framework.

Abstract

Label noise presents a fundamental challenge in modern machine learning, especially when large-scale datasets are generated via automated processes. An increasingly common and important data paradigm, particularly in domains like medical imaging, involves learning from a large dataset with coarse, noisy labels supplemented by a small, expert-verified, clean dataset. This setting constitutes a typical information transfer and fusion problem. However, the significant distribution shift between the noisy and clean data violates the core overall parametric similarity assumptions of existing statistical transfer learning methods, while their reliance on parametric models is ill-suited for complex data like images. To address these limitations, this paper develops a generic model-agnostic nonparametric framework for classification with label noise, which applies to a broad class of classifiers. Our approach leverages the small clean dataset to ``purify'' the large noisy one and carefully manages the remaining ambiguous samples. This framework is underpinned by a rigorous statistical theory. Its empirical performance is demonstrated through simulations and a real-world application to medical image analysis for pneumonia diagnosis.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Vertical SaaS for Startups 2026: Building a Niche AI-First Product

Dev.to

Model-agnostic information transfer and fusion for classification with label noise

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Vertical SaaS for Startups 2026: Building a Niche AI-First Product

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer