OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing

arXiv cs.AI / 4/6/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper proposes OntoKG, an ontology-oriented knowledge graph construction method where schema design is treated as a first-class, reusable product rather than a byproduct of pipeline code.
  • Its key technique, intrinsic-relational routing, classifies each property as intrinsic or relational and routes it to the appropriate schema module to produce a declarative, backend-portable schema.
  • The approach is demonstrated on the January 2026 Wikidata dump, using rule-based cleaning to derive a 34.6M-entity core set and then iteratively routing properties into 94 modules across 8 categories.
  • With tool-augmented LLM assistance and human review, the resulting schema reportedly achieves 93.3% category coverage and 98.0% module assignment accuracy, and the exported graph contains 34.0M nodes and 61.2M edges across 38 relationship types.
  • The authors validate reusability by applying the exported schema to five downstream tasks independently of the original construction pipeline, including ontology analysis, auditing, disambiguation, domain customization, and LLM-guided extraction.

Abstract

Organizing a large-scale knowledge graph into a typed property graph requires structural decisions -- which entities become nodes, which properties become edges, and what schema governs these choices. Existing approaches embed these decisions in pipeline code or extract relations ad hoc, producing schemas that are tightly coupled to their construction process and difficult to reuse for downstream ontology-level tasks. We present an ontology-oriented approach in which the schema is designed from the outset for ontology analysis, entity disambiguation, domain customization, and LLM-guided extraction -- not merely as a byproduct of graph building. The core mechanism is intrinsic-relational routing, which classifies every property as either intrinsic or relational and routes it to the corresponding schema module. This routing produces a declarative schema that is portable across storage backends and independently reusable. We instantiate the approach on the January 2026 Wikidata dump. A rule-based cleaning stage identifies a 34.6M-entity core set from the full dump, followed by iterative intrinsic-relational routing that assigns each property to one of 94 modules organized into 8 categories. With tool-augmented LLM support and human review, the schema reaches 93.3% category coverage and 98.0% module assignment among classified entities. Exporting this schema yields a property graph with 34.0M nodes and 61.2M edges across 38 relationship types. We validate the ontology-oriented claim through five applications that consume the schema independently of the construction pipeline: ontology structure analysis, benchmark annotation auditing, entity disambiguation, domain customization, and LLM-guided extraction.