Active Measurement of Two-Point Correlations

arXiv cs.CV / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how to measure two-point correlation functions (2PCF) when only a small, property-defined subset of points (e.g., star clusters in astronomy) is relevant.
  • It proposes a human-in-the-loop framework that uses a pre-trained classifier to adaptively choose the most informative points for human annotation.
  • The method updates pair counts across multiple distance bins after each annotation and is designed to produce unbiased estimates.
  • It introduces a novel unbiased estimator, a sampling strategy, and confidence interval construction to achieve statistically grounded scalability.
  • Compared with straightforward Monte Carlo methods, the approach lowers variance substantially while reducing the required annotation effort.

Abstract

Two-point correlation functions (2PCF) are widely used to characterize how points cluster in space. In this work, we study the problem of measuring the 2PCF over a large set of points, restricted to a subset satisfying a property of interest. An example comes from astronomy, where scientists measure the 2PCF of star clusters, which make up only a tiny subset of possible sources within a galaxy. This task typically requires careful labeling of sources to construct catalogs, which is time-consuming. We present a human-in-the-loop framework for efficient estimation of 2PCF of target sources. By leveraging a pre-trained classifier to guide sampling, our approach adaptively selects the most informative points for human annotation. After each annotation, it produces unbiased estimates of pair counts across multiple distance bins simultaneously. Compared to simple Monte Carlo approaches, our method achieves substantially lower variance while significantly reducing annotation effort. We introduce a novel unbiased estimator, sampling strategy, and confidence interval construction that together enable scalable and statistically grounded measurement of two-point correlations in astronomy datasets.