Instruction set for the representation of graphs

arXiv cs.CL / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

IsalGraph encodes any finite simple graph as a compact string using a nine-character instruction alphabet, executed by a small VM with a sparse graph, a circular doubly-linked list of node references, and two traversal pointers.
A key property is that every string over the alphabet decodes to a valid graph, and the authors present a GraphToString algorithm that encodes any connected graph in polynomial time, along with an exhaustive-backtracking variant that yields a canonical string by lexicographically selecting the smallest shortest string across all starting nodes and traversal orders.
The method is evaluated on five real-world graph benchmarks (IAM Letter LOW/MED/HIGH, LINUX, AIDS) and shown that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance, enabling isomorphism-invariant representations compatible with language models.
The work highlights direct applications in graph similarity search, graph generation, and graph-conditioned language modelling.

Abstract

We present IsalGraph, a method for representing the structure of any finite, simple graph as a compact string over a nine-character instruction alphabet. The encoding is executed by a small virtual machine comprising a sparse graph, a circular doubly-linked list (CDLL) of graph-node references, and two traversal pointers. Instructions either move a pointer through the CDLL or insert a node or edge into the graph. A key design property is that every string over the alphabet decodes to a valid graph, with no invalid states reachable. A greedy \emph{GraphToString} algorithm encodes any connected graph into a string in time polynomial in the number of nodes; an exhaustive-backtracking variant produces a canonical string by selecting the lexicographically smallest shortest string across all starting nodes and all valid traversal orders. We evaluate the representation on five real-world graph benchmark datasets (IAM Letter LOW/MED/HIGH, LINUX, and AIDS) and show that the Levenshtein distance between IsalGraph strings correlates strongly with graph edit distance (GED). Together, these properties make IsalGraph strings a compact, isomorphism-invariant, and language-model-compatible sequential encoding of graph structure, with direct applications in graph similarity search, graph generation, and graph-conditioned language modelling