Identification of NMF by choosing maximum-volume basis vectors

arXiv cs.LG / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that minimum-volume-constrained NMF often relies on sparsity in the coefficient matrix, which can cause failure or poor interpretability when data is highly mixed.
  • It introduces a new framework, maximum-volume-constrained NMF, designed to make the learned basis vectors as distinct as possible.
  • The authors prove an identifiability theorem for the proposed maximum-volume-constrained approach, addressing when the factorization can be reliably recovered.
  • They also provide an estimation algorithm and report experiments showing the method’s effectiveness versus the minimum-volume-constrained alternative.
  • The work targets more interpretable NMF results by reducing the tendency for learned basis vectors to become mixtures of the true underlying components.

Abstract

In nonnegative matrix factorization (NMF), minimum-volume-constrained NMF is a widely used framework for identifying the solution of NMF by making basis vectors as similar as possible. This typically induces sparsity in the coefficient matrix, with each row containing zero entries. Consequently, minimum-volume-constrained NMF may fail for highly mixed data, where such sparsity does not hold. Moreover, the estimated basis vectors in minimum-volume-constrained NMF may be difficult to interpret as they may be mixtures of the ground truth basis vectors. To address these limitations, in this paper we propose a new NMF framework, called maximum-volume-constrained NMF, which makes the basis vectors as distinct as possible. We further establish an identifiability theorem for maximum-volume-constrained NMF and provide an algorithm to estimate it. Experimental results demonstrate the effectiveness of the proposed method.