GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit
arXiv cs.CV / 3/24/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The article introduces GOLDMARK, a standardized benchmarking framework for computational biomarkers derived from H&E whole-slide images using AI/pathology foundation models (PFMs).
- GOLDMARK addresses gaps in computational pathology by releasing structured intermediate representations (e.g., tile coordinate maps, per-slide feature embeddings), quality-control metadata, predefined patient splits, and standardized evaluation outputs.
- Models are trained on a curated TCGA cohort with clinically actionable OncoKB level 1–3 labels and evaluated on an independent MSKCC cohort with reciprocal testing to assess cross-site generalization.
- Across 33 tumor-biomarker tasks, the reported mean AUROC is 0.689 on TCGA and 0.630 on MSKCC, improving to 0.831/0.801 when focusing on the eight highest-performing tasks.
- The study finds that encoder-to-encoder differences are modest compared with task-specific variability, and that the strongest tasks align with known morphology-genomics associations, supporting reproducible method comparison for clinical-grade deployment.