Construction of a Battery Research Knowledge Graph using a Global Open Catalog

arXiv cs.CL / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces a pipeline to build an author-centric knowledge graph for battery research using OpenAlex as a global open bibliographic catalog.
  • For each author, it generates a weighted research-descriptor vector by combining broad OpenAlex concepts with fine-grained keyphrases extracted from paper titles and abstracts via KeyBERT, using ChatGPT (gpt-3.5-turbo) as the backend.
  • The method weights vector components based on descriptor origin, authorship position, and how recently the work appeared, and it is demonstrated on 189,581 battery-related publications.
  • The resulting representations enable author-author similarity, community detection, and exploratory search via a browser UI, and the graph is serialized to RDF and linked to Wikidata for interoperability with linked open data.
  • The authors claim this cross-institutional, semantics-based approach goes beyond prior analyses limited to institutional repositories and relies less on citation/co-authorship patterns alone.

Abstract

Battery research is a rapidly growing and highly interdisciplinary field, making it increasingly difficult to track relevant expertise and identify potential collaborators across institutional boundaries. In this work, we present a pipeline for constructing an author-centric knowledge graph of battery research built on OpenAlex, a large-scale open bibliographic catalogue. For each author, we derive a weighted research descriptors vector that combines coarse-grained OpenAlex concepts with fine-grained keyphrases extracted from titles and abstracts using KeyBERT with ChatGPT (gpt-3.5-turbo) as the backend model, selected after evaluating multiple alternatives. Vector components are weighted by research descriptor origin, authorship position, and temporal recency. The framework is applied to a corpus of 189,581 battery-related works. The resulting vectors support author-author similarity computation, community detection, and exploratory search through a browser-based interface. The knowledge graph is then serialized in RDF and linked to Wikidata identifiers, making it interoperable with external linked open data sources and extensible beyond the battery domain. Unlike prior author-centric analyses confined to institutional repositories, our approach operates at cross-institutional scale and grounds similarity in domain semantics rather than citation or co-authorship structure alone.