GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning
arXiv cs.LG / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Graph self-supervised learning on graphs is computationally expensive, but the paper finds that subsampling 50% of graphs can preserve over 96% of downstream performance due to redundancy.
- It proposes GraphSculptor, a label-free method for building pre-training coresets by combining intrinsic structural signals with contextual semantic signals derived from graph-to-text descriptions.
- Structural diversity is computed from intrinsic graph statistics, while semantic diversity is obtained by encoding generated graph descriptions using a pre-trained language model.
- GraphSculptor merges both views into a unified metric space and uses cluster-aware selection to maintain joint structural-semantic diversity.
- The authors provide a theoretical loss-gap bound and show experimentally that a 10% coreset can reach 99.6% of full-data performance while cutting pre-training time by nearly 90%.
- GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning
Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to