CytoSyn: a Foundation Diffusion Model for Histopathology -- Tech Report

arXiv cs.CV / 3/20/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

CytoSyn is introduced as a state-of-the-art foundation latent diffusion model designed for histopathology to enable guided generation of highly realistic H&E-stained images.
The work presents methodological improvements, training set scaling, sampling strategies, and addresses slide-level overfitting, culminating in CytoSyn-v2 and a detailed comparison to PixCell.
CytoSyn was trained on over 10,000 TCGA diagnostic whole-slide images spanning 32 cancer types, and the authors observe strong cross-domain generalization, including generating inflammatory bowel disease images despite being trained on oncology slides.
To support the research community, the authors publicly release CytoSyn’s weights, training/validation datasets, and a sample of synthetic images on HuggingFace.

Abstract

Computational pathology has made significant progress in recent years, fueling advances in both fundamental disease understanding and clinically ready tools. This evolution is driven by the availability of large amounts of digitized slides and specialized deep learning methods and models. Multiple self-supervised foundation feature extractors have been developed, enabling downstream predictive applications from cell segmentation to tumor sub-typing and survival analysis. In contrast, generative foundation models designed specifically for histopathology remain scarce. Such models could address tasks that are beyond the capabilities of feature extractors, such as virtual staining. In this paper, we introduce CytoSyn, a state-of-the-art foundation latent diffusion model that enables the guided generation of highly realistic and diverse histopathology H&E-stained images, as shown in an extensive benchmark. We explored methodological improvements, training set scaling, sampling strategies and slide-level overfitting, culminating in the improved CytoSyn-v2, and compared our work to PixCell, a state-of-the-art model, in an in-depth manner. This comparison highlighted the strong sensitivity of both diffusion models and performance metrics to preprocessing-specific details such as JPEG compression. Our model has been trained on a dataset obtained from more than 10,000 TCGA diagnostic whole-slide images of 32 different cancer types. Despite being trained only on oncology slides, it maintains state-of-the-art performance generating inflammatory bowel disease images. To support the research community, we publicly release CytoSyn's weights, its training and validation datasets, and a sample of synthetic images in this repository: https://huggingface.co/Owkin-Bioptimus/CytoSyn.