Methods for Knowledge Graph Construction from Text Collections: Development and Applications

arXiv cs.AI / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The thesis addresses how to construct scalable, flexible knowledge graphs from rapidly growing collections of unstructured text across many domains, including news, social media, scholarly publications, and digital health records.
  • It argues that unlocking the value of text data requires combining NLP/ML/generative AI information extraction with Semantic Web techniques to produce semantically transparent, explainable, and interoperable knowledge graphs.
  • The work evaluates and develops customized algorithms using NLP, Machine Learning, and GenAI approaches, producing benchmark results and reusable data resources in the form of knowledge graphs.
  • It demonstrates three application case studies: mapping discourse in global digital transformation content, analyzing trends in AECO research publications, and generating causal relation graphs for biomedical entities from EHRs and patient-authored drug reviews.

Abstract

Virtually every sector of society is experiencing a dramatic growth in the volume of unstructured textual data that is generated and published, from news and social media online interactions, through open access scholarly communications and observational data in the form of digital health records and online drug reviews. The volume and variety of data across all this range of domains has created both unprecedented opportunities and pressing challenges for extracting actionable knowledge for several application scenarios. However, the extraction of rich semantic knowledge demands the deployment of scalable and flexible automatic methods adaptable across text genres and schema specifications. Moreover, the full potential of these data can only be unlocked by coupling information extraction methods with Semantic Web techniques for the construction of full-fledged Knowledge Graphs, that are semantically transparent, explainable by design and interoperable. In this thesis, we experiment with the application of Natural Language Processing, Machine Learning and Generative AI methods, powered by Semantic Web best practices, to the automatic construction of Knowledge Graphs from large text corpora, in three use case applications: the analysis of the Digital Transformation discourse in the global news and social media platforms; the mapping and trend analysis of recent research in the Architecture, Engineering, Construction and Operations domain from a large corpus of publications; the generation of causal relation graphs of biomedical entities from electronic health records and patient-authored drug reviews. The contributions of this thesis to the research community are in terms of benchmark evaluation results, the design of customized algorithms and the creation of data resources in the form of Knowledge Graphs, together with data analysis results built on top of them.