SciLT: Long-Tailed Classification in Scientific Image Domains

arXiv cs.CV / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies long-tailed classification on scientific image domains and finds that conventional fine-tuning of foundation models brings only limited improvements when scientific data differs strongly from natural-image pretraining distributions.
  • Experiments on three scientific benchmarks show that features from the penultimate layer are especially important for performance on tail classes.
  • Based on these insights, the authors propose SciLT, which uses adaptive feature fusion and dual-supervision to combine representations from penultimate and final layers.
  • SciLT achieves more balanced accuracy across both head and tail classes and establishes a stronger baseline for adapting foundation models to scientific long-tailed tasks with large domain shifts.

Abstract

Long-tailed recognition has benefited from foundation models and fine-tuning paradigms, yet existing studies and benchmarks are mainly confined to natural image domains, where pre-training and fine-tuning data share similar distributions. In contrast, scientific images exhibit distinct visual characteristics and supervision signals, raising questions about the effectiveness of fine-tuning foundation models in such settings. In this work, we investigate scientific long-tailed recognition under a purely visual and parameter-efficient fine-tuning (PEFT) paradigm. Experiments on three scientific benchmarks show that fine-tuning foundation models yields limited gains, and reveal that penultimate-layer features play an important role, particularly for tail classes. Motivated by these findings, we propose SciLT, a framework that exploits multi-level representations through adaptive feature fusion and dual-supervision learning. By jointly leveraging penultimate- and final-layer features, SciLT achieves balanced performance across head and tail classes. Extensive experiments demonstrate that SciLT consistently outperforms existing methods, establishing a strong and practical baseline for scientific long-tailed recognition and providing valuable guidance for adapting foundation models to scientific data with substantial domain shifts.