Identity-Aware U-Net: Fine-grained Cell Segmentation via Identity-Aware Representation Learning

arXiv cs.CV / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Identity-Aware U-Net (IAU-Net), which targets fine-grained cell/object segmentation where instances have highly similar shapes and ambiguous boundaries.
  • IAU-Net extends a U-Net encoder-decoder with an auxiliary embedding branch that learns identity-discriminative representations alongside the main pixel-level mask prediction.
  • It improves separation of morphologically similar objects by adding triplet-based metric learning that groups target-consistent embeddings and pushes apart hard negatives.
  • Experiments on cell segmentation benchmarks reportedly show strong performance gains in dense scenes with overlapping instances and near-identical contours/textures.
  • The work frames segmentation as an identity-aware problem, aiming to move beyond category-level localization toward reliable instance discrimination in dense prediction tasks.

Abstract

Precise segmentation of objects with highly similar shapes remains a challenging problem in dense prediction, especially in scenarios with ambiguous boundaries, overlapping instances, and weak inter-instance visual differences. While conventional segmentation models are effective at localizing object regions, they often lack the discriminative capacity required to reliably distinguish a target object from morphologically similar distractors. In this work, we study fine-grained object segmentation from an identity-aware perspective and propose Identity-Aware U-Net (IAU-Net), a unified framework that jointly models spatial localization and instance discrimination. Built upon a U-Net-style encoder-decoder architecture, our method augments the segmentation backbone with an auxiliary embedding branch that learns discriminative identity representations from high-level features, while the main branch predicts pixel-accurate masks. To enhance robustness in distinguishing objects with near-identical contours or textures, we further incorporate triplet-based metric learning, which pulls target-consistent embeddings together and separates them from hard negatives with similar morphology. This design enables the model to move beyond category-level segmentation and acquire a stronger capability for precise discrimination among visually similar objects. Experiments on benchmarks including cell segmentation demonstrate promising results, particularly in challenging cases involving similar contours, dense layouts, and ambiguous boundaries.