Pseudo Label NCF for Sparse OHC Recommendation: Dual Representation Learning and the Separability Accuracy Trade off

arXiv cs.AI / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “Pseudo Label NCF” to improve recommendation in Online Health Communities under extreme interaction sparsity by using survey group feature alignment as pseudo labels derived from cosine similarity.
  • It extends Neural Collaborative Filtering models (MF, MLP, and NeuMF) with an auxiliary pseudo-label objective that learns two embedding spaces: one for ranking and one for semantic alignment.
  • Experiments on 165 users and 498 support groups using a leave-one-out cold-start protocol show that pseudo-label variants improve ranking performance across all tested architectures.
  • The authors find that the pseudo-label embedding spaces yield higher cosine silhouette scores (better separability) than baselines, but that embedding separability and ranking accuracy are negatively correlated, suggesting a trade-off between interpretability and performance.
  • Overall, the results indicate that survey-derived pseudo labels can both improve sparse recommendation quality and produce more interpretable, task-specific embeddings.

Abstract

Online Health Communities connect patients for peer support, but users face a discovery challenge when they have minimal prior interactions to guide personalization. We study recommendation under extreme interaction sparsity in a survey driven setting where each user provides a 16 dimensional intake vector and each support group has a structured feature profile. We extend Neural Collaborative Filtering architectures, including Matrix Factorization, Multi Layer Perceptron, and NeuMF, with an auxiliary pseudo label objective derived from survey group feature alignment using cosine similarity mapped to [0, 1]. The resulting Pseudo Label NCF learns dual embedding spaces: main embeddings for ranking and pseudo label embeddings for semantic alignment. We evaluate on a dataset of 165 users and 498 support groups using a leave one out protocol that reflects cold start conditions. All pseudo label variants improve ranking performance: MLP improves HR@5 from 2.65% to 5.30%, NeuMF from 4.46% to 5.18%, and MF from 4.58% to 5.42%. Pseudo label embedding spaces also show higher cosine silhouette scores than baseline embeddings, with MF improving from 0.0394 to 0.0684 and NeuMF from 0.0263 to 0.0653. We further observe a negative correlation between embedding separability and ranking accuracy, indicating a trade off between interpretability and performance. These results show that survey derived pseudo labels improve recommendation under extreme sparsity while producing interpretable task specific embedding spaces.