CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification

arXiv cs.CV / 4/20/2026

📰 NewsSignals & Early TrendsIndustry & Market MovesModels & Research

Key Points

  • The CXR-LT 2026 challenge is a new multi-center benchmark for chest X-ray classification designed to better reflect the long-tailed prevalence of diseases and the open-world nature of clinical data.
  • Unlike prior benchmarks that relied on closed-set labels from a single institution and report-derived annotations, CXR-LT 2026 uses a radiologist-annotated dataset of 145,000+ images from PadChest and NIH Chest X-ray.
  • The challenge focuses on two tasks: robust multi-label classification over 30 known disease classes and open-world generalization to 6 unseen (out-of-distribution) rare disease classes.
  • The overview and evaluation in the paper indicate that vision-language foundation models improve both in-distribution and zero-shot performance, though reliably detecting rare findings across centers remains difficult.
  • The benchmark includes analyses of head-vs-tail performance, calibration, and cross-center generalization gaps, aiming to support realistic development and assessment of clinical AI systems.

Abstract

Chest X-ray (CXR) interpretation is hindered by the long-tailed distribution of pathologies and the open-world nature of clinical environments. Existing benchmarks often rely on closed-set classes from a single institution, failing to capture the prevalence of rare diseases or the appearance of novel findings. To address this, we present the CXR-LT challenge. The first event, CXR-LT 2023, established a large-scale benchmark for long-tailed multi-label CXR classification and identified key challenges in rare disease recognition. CXR-LT 2024 further expanded the label space and introduced a zero-shot task to study generalization to unseen findings. Building on the success of CXR-LT 2023 and 2024, this third iteration of the benchmark introduces a multi-center dataset comprising over 145,000 images from PadChest and NIH Chest X-ray datasets. Additionally, all development and test sets in CXR-LT 2026 are annotated by radiologists, providing a more reliable and clinically grounded evaluation than report-derived labels. The challenge defines two core tasks this year: (1) Robust Multi-Label Classification on 30 known classes and (2) Open-World Generalization to 6 unseen (out-of-distribution) rare disease classes. This paper summarizes the overview of the CXR-LT 2026 challenge. We describe the data collection and annotation procedures, analyze solution strategies adopted by participating teams, and evaluate head-versus-tail performance, calibration, and cross-center generalization gaps. Our results show that vision-language foundation models improve both in-distribution and zero-shot performance, but detecting rare findings under multi-center shift remains challenging. Our study provides a foundation for developing and evaluating AI systems in realistic long-tailed and open-world clinical conditions.