Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management

arXiv cs.AI / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a semi-automated knowledge engineering framework to build a domain-grounded, machine-readable Knowledge Graph for Total Airport Management by integrating symbolic knowledge engineering with generative LLMs.
  • It uses a scaffolded fusion approach where expert-curated structures guide LLM prompts to extract semantically aligned knowledge triples, addressing issues of terminology complexity and fragmented, siloed documentation.
  • The authors evaluate the method using the Google LangExtract library and show that document-level (longer context) processing can improve recovery of non-linear procedural dependencies versus localized segment-based inference.
  • To meet airport operations’ strict traceability and provenance needs, the framework combines probabilistic discovery with deterministic anchoring so every extraction remains verifiably linked to its source text.
  • An additional automated operationalization layer is introduced to turn unstructured textual corpora into complex operational workflow representations suitable for downstream tooling.

Abstract

Documentation of airport operations is inherently complex due to extensive technical terminology, rigorous regulations, proprietary regional information, and fragmented communication across multiple stakeholders. The resulting data silos and semantic inconsistencies present a significant impediment to the Total Airport Management (TAM) initiative. This paper presents a methodological framework for constructing a domain-grounded, machine-readable Knowledge Graph (KG) through a dual-stage fusion of symbolic Knowledge Engineering (KE) and generative Large Language Models (LLMs). The framework employs a scaffolded fusion strategy in which expert-curated KE structures guide LLM prompts to facilitate the discovery of semantically aligned knowledge triples. We evaluate this methodology on the Google LangExtract library and investigate the impact of context window utilization by comparing localized segment-based inference with document-level processing. Contrary to prior empirical observations of long-context degradation in LLMs, document-level processing improves the recovery of non-linear procedural dependencies. To ensure the high-fidelity provenance required in airport operations, the proposed framework fuses a probabilistic model for discovery and a deterministic algorithm for anchoring every extraction to its ground source. This ensures absolute traceability and verifiability, bridging the gap between "black-box" generative outputs and the transparency required for operational tooling. Finally, we introduce an automated framework that operationalizes this pipeline to synthesize complex operational workflows from unstructured textual corpora.