Cross-Granularity Representations for Biological Sequences: Insights from ESM and BiGCARP
arXiv cs.LG / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper explores how to integrate cross-granularity representations in biological sequence foundation models, contrasting symbolic granularity in language with hierarchical granularity in biology (nucleotides, amino acids, domains, genes).
- Using BiGCARP (Pfam domain-level) and ESM (amino-acid-level), the authors find that naive cross-model embedding initialization can fail, while deeper-layer embeddings better capture contextual, faithful knowledge transfer.
- Representation analysis and probe tasks show that different granularity levels encode complementary biological information rather than redundant signals.
- The study demonstrates that combining representations across granularities produces measurable gains on intermediate-level prediction tasks and can improve interpretability.
- Overall, the work positions cross-granularity integration as a promising strategy for advancing biological foundation model performance and analysis.

