Exponential Family Discriminant Analysis: Generalizing LDA-Style Generative Classification to Non-Gaussian Models

arXiv cs.LG / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper proposes Exponential Family Discriminant Analysis (EFDA), a generative classification framework that generalizes LDA to any exponential-family class-conditional distributions rather than only Gaussians.
  • EFDA provides closed-form maximum-likelihood estimators and a decision rule that is linear in the sufficient statistics, which can yield nonlinear decision boundaries in the original feature space while recovering LDA as a special case.
  • The authors prove EFDA is asymptotically calibrated and statistically efficient under correct model specification, and they show its log-odds estimator approaches the Cramér–Rao bound.
  • In simulations across multiple exponential-family distributions (e.g., Weibull, Gamma, Exponential, Poisson, Negative Binomial), EFDA matches the classification accuracy of LDA/QDA/logistic regression while substantially reducing Expected Calibration Error (ECE) in a way that persists under model misspecification.
  • To support the theoretical claims, the work formally verifies key propositions in Lean 4 with automated proof generation and machine-checked outputs.

Abstract

We introduce Exponential Family Discriminant Analysis (EFDA), a unified generative framework that extends classical Linear Discriminant Analysis (LDA) beyond the Gaussian setting to any member of the exponential family. Under the assumption that each class-conditional density belongs to a common exponential family, EFDA derives closed-form maximum-likelihood estimators for all natural parameters and yields a decision rule that is linear in the sufficient statistic, recovering LDA as a special case and capturing nonlinear decision boundaries in the original feature space. We prove that EFDA is asymptotically calibrated and statistically efficient under correct specification, and we generalise it to K \geq 2 classes and multivariate data. Through extensive simulation across five exponential-family distributions (Weibull, Gamma, Exponential, Poisson, Negative Binomial), EFDA matches the classification accuracy of LDA, QDA, and logistic regression while reducing Expected Calibration Error (ECE) by 2--6\times, a gap that is \emph{structural}: it persists for all n and across all class-imbalance levels, because misspecified models remain asymptotically miscalibrated. We further prove and empirically confirm that EFDA's log-odds estimator approaches the Cram\'{e}r-Rao bound under correct specification, and is the only estimator in our comparison whose mean squared error converges to zero. Complete derivations are provided for nine distributions. Finally, we formally verify all four theoretical propositions in Lean 4, using Aristotle (Harmonic) and OpenGauss (Math, Inc.) as proof generators, with all outputs independently machine-checked by AXLE (Axiom).