BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels

arXiv cs.AI / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses biomedical information retrieval by explicitly modeling domain semantics and hierarchical relationships among biomedical texts.
  • It proposes BioHiCL, a hierarchical multi-label contrastive learning approach that uses structured MeSH label hierarchies as supervision.
  • The authors argue that prior biomedical generative retrievers rely too heavily on coarse binary relevance signals and thus struggle to capture semantic overlap effectively.
  • BioHiCL is implemented in two efficient model sizes (BioHiCL-Base at 0.1B and BioHiCL-Large at 0.3B), which show strong results across biomedical retrieval, sentence similarity, and question answering.
  • The approach is presented as computationally efficient enough to support practical deployment scenarios while maintaining competitive performance.

Abstract

Effective biomedical information retrieval requires modeling domain semantics and hierarchical relationships among biomedical texts. Existing biomedical generative retrievers build on coarse binary relevance signals, limiting their ability to capture semantic overlap. We propose BioHiCL (Biomedical Retrieval with Hierarchical Multi-Label Contrastive Learning), which leverages hierarchical MeSH annotations to provide structured supervision for multi-label contrastive learning. Our models, BioHiCL-Base (0.1B) and BioHiCL-Large (0.3B), achieve promising performance on biomedical retrieval, sentence similarity, and question answering tasks, while remaining computationally efficient for deployment.