Enhancing Unsupervised Keyword Extraction in Academic Papers through Integrating Highlights with Abstract

arXiv cs.CL / 4/22/2026

💬 OpinionModels & Research

Key Points

  • The paper studies how incorporating the “highlights” section of academic papers can improve unsupervised keyword extraction beyond using the abstract alone.
  • The authors evaluate three input settings—abstract only, highlights only, and a combined abstract+highlights input—using four unsupervised models.
  • Experiments on Computer Science (CS) and Library and Information Science (LIS) datasets show that combining abstract and highlights significantly boosts keyword extraction performance.
  • The work also analyzes how differences in keyword coverage and content between abstracts and highlights affect the resulting extracted keywords.
  • The authors release the data and code via the provided GitHub repository, supporting reproducibility and further research.

Abstract

Automatic keyword extraction from academic papers is a key area of interest in natural language processing and information retrieval. Although previous research has mainly focused on utilizing abstract and references for keyword extraction, this paper focuses on the highlights section - a summary describing the key findings and contributions, offering readers a quick overview of the research. Our observations indicate that highlights contain valuable keyword information that can effectively complement the abstract. To investigate the impact of incorporating highlights into unsupervised keyword extraction, we evaluate three input scenarios: using only the abstract, the highlights, and a combination of both. Experiments conducted with four unsupervised models on Computer Science (CS), Library and Information Science (LIS) datasets reveal that integrating the abstract with highlights significantly improves extraction performance. Furthermore, we examine the differences in keyword coverage and content between abstract and highlights, exploring how these variations influence extraction outcomes. The data and code are available at https://github.com/xiangyi-njust/Highlight-KPE.