Word Alignment-Based Evaluation of Uniform Meaning Representations

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper addresses a core challenge in evaluating graph-based sentence meaning representations when different systems produce UMR graphs with different node counts and unclear node-to-node correspondence.
  • It proposes a node-matching evaluation algorithm that leverages UMR’s inherent node–word alignments to make comparisons between multiple UMRs more intuitive and interpretable.
  • The authors argue that prior evaluation methods (notably smatch, the AMR de-facto standard) can overfit to maximizing F1 on node relations/attributes, producing mismatches that are less useful for detailed error analysis.
  • The proposed approach is positioned as avoiding the NP-hard search complexity associated with smatch, while enabling more meaningful identification of where representations diverge.
  • The work includes a freely available script implementing the method for practical use.

Abstract

Comparison and evaluation of graph-based representations of sentence meaning is a challenge because competing representations of the same sentence may have different number of nodes, and it is not obvious which nodes should be compared to each other. Existing approaches favor node mapping that maximizes F_1 score over node relations and attributes, regardless whether the similarity is intentional or accidental; consequently, the identified mismatches in values of node attributes are not useful for any detailed error analysis. We propose a node-matching algorithm that allows comparison of multiple Uniform Meaning Representations (UMR) of one sentence and that takes advantage of node-word alignments, inherently available in UMR. We compare it with previously used approaches, in particular smatch (the de-facto standard in AMR evaluation), and argue that sensitivity to word alignment makes the comparison of meaning representations more intuitive and interpretable, while avoiding the NP-hard search problem inherent in smatch. A script implementing the method is freely available.