Semi-Supervised Learning with Balanced Deep Representation Distributions
arXiv cs.LG / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper addresses semi-supervised text classification (SSTC) where self-training depends heavily on the accuracy of pseudo-labels for unlabeled data.
- It identifies a “margin bias” issue stemming from mismatched representation (feature) distributions between labels in SSTC.
- To reduce this bias, it introduces an angular margin loss and applies Gaussian linear transformations to balance the variance of label angles within each class.
- The proposed method, S2TC-BDD, constrains label angle variances using estimates computed over both labeled and pseudo-labeled texts during self-training iterations.
- Experiments on multi-class and multi-label settings show S2TC-BDD improves performance over state-of-the-art SSTC methods, with the largest gains when labeled data is scarce.
