A proposal for PU classification under Non-SCAR using clustering and logistic model
arXiv stat.ML / 4/21/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a PU (Positive-Unlabeled) classification approach for cases where the SCAR assumption does not hold, using a simple cluster-cleaning method.
- It first generates “cleaning labels” via 2-means clustering, then fits logistic regression on the cleaned data by treating clustered positives as positives with extra true-positive observations.
- Remaining samples are labeled as negative, enabling the classifier to learn from the cleaned PU structure.
- The method is evaluated on 11 real machine-learning datasets plus a synthetic dataset, showing that the clustering step can still be effective under SCAR violations.
- The study also assesses robustness, finding that the LassoJoint method exhibits moderate robustness to perturbations of the SCAR condition.
Related Articles

Rethinking Coding Education for the AI Era
Dev.to

We Shipped an MVP With Vibe-Coding. Here's What Nobody Tells You About the Aftermath
Dev.to

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents
Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work
Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)
Dev.to