Fused Multinomial Logistic Regression Utilizing Summary-Level External Machine-learning Information

arXiv stat.ML / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes an empirical-likelihood framework to “fuse” summary-level external machine-learning predictions into primary-study inference for multinomial logistic regression.
It leverages nonparametric ML predictions to induce a rich set of moment constraints that can remain robust to covariate shift under an overlap condition, without needing density-ratio modeling.
The method explicitly handles external-data quality problems such as coarsened outcomes, partially observed covariates, covariate shift, and concept shift (heterogeneous data-generating mechanisms).
The authors prove large-sample theoretical results for the fused estimator, including consistency and asymptotic normality, and provide conditions under which using external predictions yields a strict efficiency improvement over primary-only analysis.
They validate the approach with simulations and demonstrate an application to multiclass blood-pressure classification using NHANES data.

Abstract

In many modern applications, a carefully designed primary study provides individual-level data for interpretable modeling, while summary-level external information is available through black-box, efficient, and nonparametric machine-learning predictions. Although summary-level external information has been studied in the data integration literature, there is limited methodology for leveraging external nonparametric machine-learning predictions to improve statistical inference in the primary study. We propose a general empirical-likelihood framework that incorporates external predictions through moment constraints. An advantage of nonparametric machine-learning prediction is that it induces a rich class of valid moment restrictions that remain robust to covariate shift under a mild overlap condition without requiring explicit density-ratio modeling. We focus on multinomial logistic regression as the primary model and address common data-quality issues in external sources, including coarsened outcomes, partially observed covariates, covariate shift, and heterogeneity in generating mechanisms known as concept shift. We establish large-sample properties of the resulting fused estimator, including consistency and asymptotic normality under regularity conditions. Moreover, we provide mild sufficient conditions under which incorporating external predictions delivers a strict efficiency gain relative to the primary-only estimator. Simulation studies and an application to the National Health and Nutrition Examination Survey on multiclass blood-pressure classification.

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

How AI Humanizers Improve Sentence Structure and Style

Dev.to

Two Kinds of Agent Trust (and Why You Need Both)

Dev.to

Agent Diary: Apr 10, 2026 - The Day I Became a Workflow Ouroboros (While Run 236 Writes About Writing About Writing)

Dev.to

Fused Multinomial Logistic Regression Utilizing Summary-Level External Machine-learning Information

Key Points

Abstract

Related Articles

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

How AI Humanizers Improve Sentence Structure and Style

Two Kinds of Agent Trust (and Why You Need Both)

Agent Diary: Apr 10, 2026 - The Day I Became a Workflow Ouroboros (While Run 236 Writes About Writing About Writing)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer