Copula-enhanced Vision Transformer for high myopia diagnosis through OU UWF fundus images

arXiv cs.CV / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper targets AI-assisted myopia screening by jointly performing two tasks from both-eye (OU) ultra-widefield fundus images: diagnosing OU high myopia (a binary outcome) and predicting axial length (a continuous outcome).
  • It proposes a Vision Transformer approach that uses residual adapters on a foundation model to capture both inter-ocular similarity and heterogeneity between the two eyes.
  • To handle the mixed binary–continuous multitask outputs, the authors introduce a four-dimensional copula loss that models conditional dependence via a Gaussian copula likelihood, implemented in PyTorch.
  • They develop a computationally efficient fast Monte Carlo Expectation Maximization (fMCEM) algorithm for estimating copula parameters and show theoretical numerical stability under a multitask overfitting issue they call the stronger covariance phenomenon.
  • Experiments on an annotated OU ultra-widefield fundus dataset and on synthetic data show stable improvements in both classification and regression performance with the proposed method.

Abstract

The advancement of AI-assisted myopia screening necessitates the joint diagnosis of both-eye (OU) high myopia (HM) status and the prediction of axial length (AL). This clinical requirement introduces a complex mixed-type (binary-continuous) multitask learning task with bi-domain (OU) image covariates, giving rise to two key challenges: i) capture the inter-ocular asymmetry of OU images within a cutting-edge foundation model; ii) model and estimate the conditional dependence structure among mixed-type multivariate responses given image covariates. We address the challenges by: i) imposing residual adapters on the Vision Transformer foundation model to capture the OU similarity and heterogeneity simultaneously; ii) developing a four-dimensional copula loss that is implementable in PyTorch based on a latent variable expression for the Gaussian copula likelihood, and proposing a computationally efficient fast Monte Carlo Expectation Maximization (fMCEM) algorithm to estimate copula parameters. We further formulate a specific overfitting problem called stronger covariance phenomenon in multitask learning. We reveal the disturbance of the phenomenon to estimation of copula parameters and theoretically demonstrate the numerical stability of the proposed fMCEM algorithm against the disturbance. The application to our annotated OU ultra-widefield fundus image dataset and simulation on synthetic data demonstrate that our method stably enhances the predictive capabilities on both classification and regression tasks.

Copula-enhanced Vision Transformer for high myopia diagnosis through OU UWF fundus images | AI Navigate