Faithful Autoformalization via Roundtrip Verification and Repair

arXiv cs.CL / 4/29/2026

📰 NewsModels & Research

Key Points

  • The paper addresses how to verify that an LLM’s natural-language-to-formal translation is faithful, proposing a roundtrip workflow with formal equivalence checking.
  • The method formalizes a statement, translates the formal output back to natural language, re-formalizes it, and then uses a formal tool to check logical equivalence.
  • If the two formalizations agree, the approach provides evidence of faithfulness; if they disagree, a diagnostic step pinpoints the failing stage and a targeted repair operator attempts to fix it.
  • Experiments on 150 traffic rules using Claude Opus 4.6 and GPT-5.2 show diagnosis-guided repair improves formal equivalence from about 45–61% to 83–85%, beating a random-repair baseline.
  • An independent NLI analysis finds that higher formal equivalence is associated with less semantic drift, supporting the approach’s effectiveness beyond pure logical checks.

Abstract

When an LLM formalizes natural language, how do we know the output is faithful? We propose a roundtrip verification approach which does not require ground-truth annotations: formalize a statement, translate the result back to natural language, re-formalize, and use a formal tool to check logical equivalence. When the two formalizations agree, this provides evidence of a faithful formalization. When they disagree, a diagnosis step identifies which translation stage failed, and a targeted repair operator attempts to correct that stage. We evaluate our approach on 150 traffic rules using Claude Opus 4.6 and GPT-5.2. Diagnosis-guided repair raises formal equivalence from 45--61% to 83--85% for both models, outperforming a random-repair baseline. An independent NLI analysis confirms that formal equivalence is correlated with less semantic drift.