Team Fusion@ SU@ BC8 SympTEMIST track: transformer-based approach for symptom recognition and linking

arXiv cs.CL / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a transformer-based system to perform SympTEMIST named entity recognition (NER) and entity linking (EL) for symptom data using a RoBERTa-based token classifier.
  • For NER, the approach fine-tunes a RoBERTa model augmented with BiLSTM and CRF layers, leveraging an augmented training set to improve token-level entity extraction.
  • For entity linking, it generates cross-lingual candidates using SapBERT XLMR-Large and ranks them by cosine similarity to entries in a knowledge base.
  • The authors report that the selection of the knowledge base is the most influential factor for improving EL (and overall) accuracy.
  • The work is presented as a new arXiv release, positioning it as a research-method contribution for symptom-related biomedical/NLP pipelines.

Abstract

This paper presents a transformer-based approach to solving the SympTEMIST named entity recognition (NER) and entity linking (EL) tasks. For NER, we fine-tune a RoBERTa-based (1) token-level classifier with BiLSTM and CRF layers on an augmented train set. Entity linking is performed by generating candidates using the cross-lingual SapBERT XLMR-Large (2), and calculating cosine similarity against a knowledge base. The choice of knowledge base proves to have the highest impact on model accuracy.