T-Norm Operators for EU AI Act Compliance Classification: An Empirical Comparison of Lukasiewicz, Product, and G\"odel Semantics in a Neuro-Symbolic Reasoning System

arXiv cs.AI / 3/31/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents an empirical comparison of three t-norm conjunction operators (Lukasiewicz, Product, and G"odel) within a neuro-symbolic reasoning system aimed at EU AI Act compliance classification.
  • Using the LGGT+ engine and a benchmark of 1,035 annotated AI system descriptions across four risk categories, the authors find statistically significant performance differences between operators (McNemar p<0.001).
  • G"odel (min-semantics) achieves the highest overall accuracy (84.5%) and best borderline recall (85%) but adds a small false-positive rate (8 false positives, 0.8%) due to over-classification.
  • Lukasiewicz and Product show zero false positives in this pilot, with Product outperforming Lukasiewicz (81.2% vs. 78.5%) while tending to miss borderline cases.
  • The study concludes that rule-base completeness matters more than operator choice, and proposes mixed-semantics as the next productive step while releasing the LGGT+ core engine and the benchmark under Apache 2.0.

Abstract

We present a first comparative pilot study of three t-norm operators -- Lukasiewicz (T_L), Product (T_P), and G\"odel (T_G) - as logical conjunction mechanisms in a neuro-symbolic reasoning system for EU AI Act compliance classification. Using the LGGT+ (Logic-Guided Graph Transformers Plus) engine and a benchmark of 1035 annotated AI system descriptions spanning four risk categories (prohibited, high_risk, limited_risk, minimal_risk), we evaluate classification accuracy, false positive and false negative rates, and operator behaviour on ambiguous cases. At n=1035, all three operators differ significantly (McNemar p<0.001). T_G achieves highest accuracy (84.5%) and best borderline recall (85%), but introduces 8 false positives (0.8%) via min-semantics over-classification. T_L and T_P maintain zero false positives, with T_P outperforming T_L (81.2% vs. 78.5%). Our principal findings are: (1) operator choice is secondary to rule base completeness; (2) T_L and T_P maintain zero false positives but miss borderline cases; (3) T_G's min-semantics achieves higher recall at cost of 0.8% false positive rate; (4) a mixed-semantics classifier is the productive next step. We release the LGGT+ core engine (201/201 tests passing) and benchmark dataset (n=1035) under Apache 2.0.