NucEval: A Robust Evaluation Framework for Nuclear Instance Segmentation

arXiv cs.CV / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces NucEval, a unified evaluation framework aimed at improving how nuclear instance segmentation is assessed in computational pathology.
  • It identifies four often-underappreciated evaluation-pipeline problems—vague regions, score normalization, overlapping instances, and border uncertainty—and provides specific fixes for each.
  • NucEval is tested on the NuInsSeg dataset plus two external datasets, using both CNN- and ViT-based segmentation models to show how the proposed changes affect instance segmentation metrics.
  • The authors make the code, guidelines, and example usage publicly available to support robust and reproducible evaluation across studies.
  • Overall, the work argues that evaluation methodology can substantially change reported performance for nuclear instance segmentation systems, not just the models themselves.

Abstract

In computational pathology, nuclear instance segmentation is a fundamental task with many downstream clinical applications. With the advent of deep learning, many approaches, including convolutional neural networks (CNNs) and vision transformers (ViTs), have been proposed for this task, along with both machine learning-based and non-machine learning-based pre- and post-processing techniques to further boost performance. However, one fundamental aspect that has received less attention is the evaluation pipeline. In this study, we identify four key issues associated with nuclear instance segmentation evaluation and propose corresponding solutions. Our proposed modifications, namely handling vague regions, score normalization, overlapping instances, and border uncertainty, are integrated into a unified framework called NucEval, which enables robust evaluation of nuclear instance segmentation. We evaluate this pipeline using the NuInsSeg dataset, which provides unique characteristics that make it particularly suitable for this study, as well as two additional external datasets, with three CNN- and ViT-based nuclear instance segmentation models, to demonstrate the impact of these modifications on instance segmentation metrics. The code, along with complete guidelines and illustrative examples, is publicly available at: https://github.com/masih4/nuc_eval.