Copula-enhanced Vision Transformer for high myopia diagnosis through OU UWF fundus images

arXiv cs.CV / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper targets AI-assisted myopia screening by jointly performing two tasks from both-eye (OU) ultra-widefield fundus images: diagnosing OU high myopia (a binary outcome) and predicting axial length (a continuous outcome).
It proposes a Vision Transformer approach that uses residual adapters on a foundation model to capture both inter-ocular similarity and heterogeneity between the two eyes.
To handle the mixed binary–continuous multitask outputs, the authors introduce a four-dimensional copula loss that models conditional dependence via a Gaussian copula likelihood, implemented in PyTorch.
They develop a computationally efficient fast Monte Carlo Expectation Maximization (fMCEM) algorithm for estimating copula parameters and show theoretical numerical stability under a multitask overfitting issue they call the stronger covariance phenomenon.
Experiments on an annotated OU ultra-widefield fundus dataset and on synthetic data show stable improvements in both classification and regression performance with the proposed method.

Abstract

The advancement of AI-assisted myopia screening necessitates the joint diagnosis of both-eye (OU) high myopia (HM) status and the prediction of axial length (AL). This clinical requirement introduces a complex mixed-type (binary-continuous) multitask learning task with bi-domain (OU) image covariates, giving rise to two key challenges: i) capture the inter-ocular asymmetry of OU images within a cutting-edge foundation model; ii) model and estimate the conditional dependence structure among mixed-type multivariate responses given image covariates. We address the challenges by: i) imposing residual adapters on the Vision Transformer foundation model to capture the OU similarity and heterogeneity simultaneously; ii) developing a four-dimensional copula loss that is implementable in PyTorch based on a latent variable expression for the Gaussian copula likelihood, and proposing a computationally efficient fast Monte Carlo Expectation Maximization (fMCEM) algorithm to estimate copula parameters. We further formulate a specific overfitting problem called stronger covariance phenomenon in multitask learning. We reveal the disturbance of the phenomenon to estimation of copula parameters and theoretically demonstrate the numerical stability of the proposed fMCEM algorithm against the disturbance. The application to our annotated OU ultra-widefield fundus image dataset and simulation on synthetic data demonstrate that our method stably enhances the predictive capabilities on both classification and regression tasks.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B

Reddit r/LocalLLaMA

Copula-enhanced Vision Transformer for high myopia diagnosis through OU UWF fundus images

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Roundtable chat with Talkie-1930 and Gemma 4 31B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer