Fractionally Supervised Classification with Maxima Nominated Samples

arXiv cs.LG / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

Fractionally supervised classification (FSC) is a framework for combining labeled and unlabeled data, but prior work assumed that observations come from simple random sampling.
The article focuses on maxima nomination sampling, where the retained observation is an extreme order statistic (e.g., the maximum), which fundamentally changes the likelihood and breaks the standard FSC EM approach.
The authors propose a new latent-variable formulation that models both the class of the observed maximum and the latent composition of the remaining units in each set.
They derive a proper EM algorithm and a coherent weighted-likelihood FSC procedure tailored to maxima-nominated samples, and validate it with simulations and a real-data application.
Experiments on rare-event mixture contamination show the method substantially outperforms a misspecified alternative that discards the additional rank information inherent in maxima-nominated data.

Abstract

Fractionally supervised classification (FSC) offers a flexible framework for combining labeled and unlabeled data in model-based classification, but existing formulations assume simple random sampling. In many applications, however, the retained observation is an extreme order statistic from a set rather than a randomly selected unit. This is particularly appealing when the target population is rare, since maxima nomination sampling (NS) can enrich the sample with the most informative observations, as in screening, environmental monitoring, repeated testing, and reliability studies. Under such designs, the likelihood function changes fundamentally, and the usual FSC EM construction is no longer valid. We develop FSC for nominated samples by introducing a latent representation that accounts for both the class membership of the observed maximum and the latent composition of the remaining units in the set. The resulting method yields a proper EM algorithm and a coherent weighted-likelihood FSC procedure for NS data. We present the methodology in general form, illustrate it for a rare-event contamination normal mixtures, and show through simulation that it substantially improves on the misspecified alternative by ignoring the extra rank information of such data. A real-data analysis demonstrates its practical value.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

Fractionally Supervised Classification with Maxima Nominated Samples

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer