Expectation Maximization (EM) Converges for General Agnostic Mixtures

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies an agnostic mixture-fitting problem where data are not assumed to follow a generative model, and the goal is to fit k parametric functions by minimizing a chosen loss.
  • It generalizes prior results on mixed linear regression by considering gradient EM with any strongly convex, smooth loss, covering settings such as mixed linear regression (regularized), mixed classifiers (logistic/SVM), and mixed generalized linear regression.
  • Under proper initialization and a separation condition, the authors prove that gradient EM iterates converge exponentially to population loss minimizers with high probability.
  • The results extend the effectiveness of EM-type methods, showing EM can reach (appropriately defined) optimal solutions even in a non-generative, agnostic regime beyond mixtures of linear regressions.

Abstract

Mixture of linear regression is well studied in statistics and machine learning, where the data points are generated probabilistically using k linear models. Algorithms like Expectation Maximization (EM) may be used to recover the ground truth regressors for this problem. Recently, in \cite{pal2022learning,ghosh_agnostic} the mixed linear regression problem is studied in the agnostic setting, where no generative model on data is assumed. Rather, given a set of data points, the objective is \emph{fit} k lines by minimizing a suitable loss function. It is shown that a modification of EM, namely gradient EM converges exponentially to appropriately defined loss minimizer even in the agnostic setting. In this paper, we study the problem of \emph{fitting} k parametric functions to given set of data points. We adhere to the agnostic setup. However, instead of fitting lines equipped with quadratic loss, we consider any arbitrary parametric function fitting equipped with a strongly convex and smooth loss. This framework encompasses a large class of problems including mixed linear regression (regularized), mixed linear classifiers (mixed logistic regression, mixed Support Vector Machines) and mixed generalized linear regression. We propose and analyze gradient EM for this problem and show that with proper initialization and separation condition, the iterates of gradient EM converge exponentially to appropriately defined population loss minimizers with high probability. This shows the effectiveness of EM type algorithm which converges to \emph{optimal} solution in the non-generative setup beyond mixture of linear regression.