Learning from Label Proportions with Dual-proportion Constraints

arXiv cs.LG / 2026/3/24

📰 ニュースIdeas & Deep AnalysisModels & Research

要点

  • The paper addresses Learning from Label Proportions (LLP), a weakly supervised learning setting where training uses bag-level label proportion information to infer instance-level labels.
  • It proposes LLP-DC, a training approach that enforces Dual proportion Constraints at both the bag level (matching the mean prediction to the given proportions) and the instance level (using pseudo-labels consistent with the constraints).
  • The instance-level pseudo-label generation is formulated via a minimum-cost maximum-flow algorithm to produce hard pseudo-labels that satisfy the proportion requirements.
  • Experiments across multiple benchmark datasets and varying bag sizes show that LLP-DC improves consistently over prior LLP methods.
  • The authors provide public code for replication and further experimentation via the linked GitHub repository.

Abstract

Learning from Label Proportions (LLP) is a weakly supervised problem in which the training data comprise bags, that is, groups of instances, each annotated only with bag-level class label proportions, and the objective is to learn a classifier that predicts instance-level labels. This setting is widely applicable when privacy constraints limit access to instance-level annotations or when fine-grained labeling is costly or impractical. In this work, we introduce a method that leverages Dual proportion Constraints (LLP-DC) during training, enforcing them at both the bag and instance levels. Specifically, the bag-level training aligns the mean prediction with the given proportion, and the instance-level training aligns hard pseudo-labels that satisfy the proportion constraint, where a minimum-cost maximum-flow algorithm is used to generate hard pseudo-labels. Extensive experimental results across various benchmark datasets empirically validate that LLP-DC consistently improves over previous LLP methods across datasets and bag sizes. The code is publicly available at https://github.com/TianhaoMa5/CV PR2026_Findings_LLP_DC.