Plug-and-Play Logit Fusion for Heterogeneous Pathology Foundation Models

arXiv cs.CV / 4/10/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper proposes LogitProd, a plug-and-play logit-only fusion method that combines multiple independently trained pathology foundation model (FM) predictors as fixed experts while learning sample-adaptive fusion weights from slide-level outputs.
  • LogitProd avoids costly encoder retraining and feature-space alignment across heterogeneous backbones by operating purely on logits, making it easier to upgrade existing multi-model histopathology pipelines.
  • The authors provide theoretical guarantees that the learned weighted product (optimal weighted product fusion) will perform at least as well as the best individual expert under the training objective.
  • Across 22 pathology benchmarks covering WSI classification, tile classification, gene mutation prediction, and discrete-time survival modeling, LogitProd achieves top performance on 20/22 tasks and improves average results by about 3% over the strongest single expert.
  • Compared with feature-fusion approaches, the method reduces training cost by roughly 12×, addressing the practical model-selection bottleneck created by the proliferation of pathology FMs.

Abstract

Pathology foundation models (FMs) have become central to computational histopathology, offering strong transfer performance across a wide range of diagnostic and prognostic tasks. The rapid proliferation of pathology foundation models creates a model-selection bottleneck: no single model is uniformly best, yet exhaustively adapting and validating many candidates for each downstream endpoint is prohibitively expensive. We address this challenge with a lightweight and novel model fusion strategy, LogitProd, which treats independently trained FM-based predictors as fixed experts and learns sample-adaptive fusion weights over their slide-level outputs. The fusion operates purely on logits, requiring no encoder retraining and no feature-space alignment across heterogeneous backbones. We further provide a theoretical analysis showing that the optimal weighted product fusion is guaranteed to perform at least as well as the best individual expert under the training objective. We systematically evaluate LogitProd on \textbf{22} benchmarks spanning WSI-level classification, tile-level classification, gene mutation prediction, and discrete-time survival modeling. LogitProd ranks first on 20/22 tasks and improves the average performance across all tasks by ~3% over the strongest single expert. LogitProd enables practitioners to upgrade heterogeneous FM-based pipelines in a plug-and-play manner, achieving multi-expert gains with \sim12\times lower training cost than feature-fusion alternatives.