Heterogeneous Graph Importance Scoring and Clustering with Automated LLM-based Interpretation

arXiv cs.LG / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an end-to-end, open-data pipeline to rank the importance of urban bridges using heterogeneous graph analysis, unsupervised clustering, and automated LLM-based interpretation.
  • It builds heterogeneous graphs from OpenStreetMap by combining bridges, road networks, buildings, and public facilities, then computes five social-impact indicators to form 52-dimensional bridge feature vectors.
  • The method applies UMAP for dimensionality reduction and HDBSCAN for density-based clustering to discover bridge “archetypes” across multiple cities.
  • Clusters are automatically interpreted with temperature-optimized LLMs (Elyza8b), aiming to generate policy-relevant insights without manual labeling.
  • The study claims validation on multi-city data and demonstrates transferability by adapting configurations only, while achieving about a 40× computational optimization for the scoring workflow.

Abstract

Urban bridge networks are critical infrastructure whose disruption can cascade into severe impacts on transportation, emergency services, and economic activity. This paper presents a comprehensive methodology for assessing bridge importance through heterogeneous graph analysis, unsupervised clustering, and automated interpretation via large language models (LLMs). Our approach addresses three fundamental challenges: (1) quantifying multi-dimensional bridge importance using only open data sources, (2) discovering functional bridge archetypes across different cities, and (3) generating policy-relevant interpretations automatically. We construct heterogeneous graphs from OpenStreetMap (OSM) data incorporating bridges, road networks, buildings, and public facilities. Five social impact indicators are computed: transit desert score, hospital access score, isolation risk score, supply chain impact score, and green space access score. These 52-dimensional feature vectors undergo dimensionality reduction via UMAP and density-based clustering via HDBSCAN. Discovered clusters are interpreted using temperature-optimized LLMs (Elyza8b, trained on construction domain corpus). (1) A complete open-data pipeline from OSM to actionable bridge importance rankings, (2) a five-indicator scoring methodology with 40\times computational optimization, (3) a UMAP+HDBSCAN clustering framework validated on multi-city data, (4) an LLM interpretation methodology including temperature optimization and model selection rationale, and (5) transferability demonstration across cities via configuration-only adaptation.