Dense Point-to-Mask Optimization with Reinforced Point Selection for Crowd Instance Segmentation

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses crowd instance segmentation where datasets commonly provide point labels, but high-quality region/mask labels are scarce and inaccurate, limiting downstream accuracy for counting and localization.
It introduces Dense Point-to-Mask Optimization (DPMO), combining SAM with a Nearest Neighbor Exclusive Circle (NNEC) constraint to convert dense crowd point annotations into improved mask annotations (with optional manual correction).
For prediction in dense scenes, it proposes Reinforced Point Selection (RPS), which uses Group Relative Policy Optimization (GRPO) to select the best point from sampled candidates before generating instance outputs.
Experiments report state-of-the-art performance on multiple crowd datasets (ShanghaiTech, UCF-QNRF, JHU-CROWD++, NWPU-Crowd), and the authors show that mask-supervised losses can significantly improve counting accuracy across models.
Overall, the work highlights that dense crowd segmentation can be improved by better point-to-mask pseudo-label generation and by reinforcement-style point selection rather than directly applying standard foundation-model prompting.

Abstract

Crowd instance segmentation is a crucial task with a wide range of applications, including surveillance and transportation. Currently, point labels are common in crowd datasets, while region labels (e.g., boxes) are rare and inaccurate. The masks obtained through segmentation help to improve the accuracy of region labels and resolve the correspondence between individual location coordinates and crowd density maps. However, directly applying currently popular large foundation models such as SAM does not yield ideal results in dense crowds. To this end, we first propose Dense Point-to-Mask Optimization (DPMO), which integrates SAM with the Nearest Neighbor Exclusive Circle (NNEC) constraint to generate dense instance segmentation from point annotations. With DPMO and manual correction, we obtain mask annotations from the existing point annotations for traditional crowd datasets. Then, to predict instance segmentation in dense crowds, we propose a Reinforced Point Selection (RPS) framework trained with Group Relative Policy Optimization (GRPO), which selects the best predicted point from a sampling of the initial point prediction. Through extensive experiments, we achieve state-of-the-art crowd instance segmentation performance on ShanghaiTech, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd datasets. Furthermore, we design new loss functions supervised by masks that boost counting performance across different models, demonstrating the significant role of mask annotations in enhancing counting accuracy.

Black Hat Asia

AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening

Reddit r/artificial

Dense Point-to-Mask Optimization with Reinforced Point Selection for Crowd Instance Segmentation

Key Points

Abstract

Related Articles

Black Hat Asia

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

Portable eye scanner powered by AI expands access to low-cost community screening

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer