Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

arXiv stat.ML / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes nonnegative matrix factorization using a component-wise L1 error (L1-NMF) to better handle heavy-tailed noise and outliers compared with standard least-squares NMF.
  • It proves that L1-NMF is NP-hard even for rank r = 1, highlighting fundamental computational difficulty relative to conventional NMF.
  • The authors show L1-NMF naturally induces strong sparsity in the learned factors when the input matrix is sparse, improving interpretability but potentially hurting performance under false zeros.
  • To address this, they introduce weighted L1-NMF (wL1-NMF), which controls sparsity by penalizing entries of WH corresponding to zeros in the observed data.
  • They present a coordinate descent algorithm (sparse CD / sCD) with subproblems solved via a weighted median method, and argue its complexity scales with the number of nonzero entries, making it suitable for large-scale sparse datasets.

Abstract

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, X, by the product of two nonnegative factors, WH, where W has r columns and H has r rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when r=1, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of WH associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).