VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

arXiv cs.AI / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

VeriTrans は、自然言語要件をソルバ実行可能な論理（NL→PL）へ変換しつつ、記号的検証（CNF コンパイル）とバリデータによるゲートで信頼性を高める “reliability-first” パイプラインを提案しています。
命令チューニング済みの NL→PL 翻訳に加えて、PL→NL のラウンドトリップ復元を高精度な受理条件として用い、温度=0・固定シード（seed=42）やプロンプト/出力/ハッシュ等の全アーティファクトログで監査と再現性を実現します。
SatBench の 2,100 仕様に対し 94.46% の SAT/UNSAT 正確性と、87.73% の中央値ラウンドトリップ類似度を報告しています。
100〜150 件の厳選例によるコンパクトなファインチューニングで忠実度が 1〜1.5pp 改善し、平均 25.8 秒/仕様（指定した 201 仕様サブセット）で遅延増を抑えたとされています。
受理閾値（例: τ=75）により “reliability–coverage” のトレードオフを制御でき、受理された集合で約 94% の正確性を保ちながらおよそ 68% の項目を保持できると示されています。

Abstract

\textbf{VeriTrans} is a reliability-first ML system that compiles natural-language requirements into solver-ready logic with validator-gated reliability. The pipeline integrates an instruction-tuned NL

\!\to\!

PL translator, round-trip reconstruction (PL

\!\to\!

NL) used as a high-precision acceptance gate, and canonical PL

\!\to\!

CNF compilation, all executed via fixed API configuration (temperature

=0

; fine-tuning runs use seed

=42

) and per-item artifact logging (prompts, outputs, hashes) to support auditability and replay-driven debugging. On \textbf{SatBench} (2{,}100 specifications), VeriTrans achieves 94.46\% SAT/UNSAT correctness and 87.73\% median round-trip similarity. Compact fine-tuning on 100--150 curated examples improves fidelity by about 1--1.5\,pp without increasing latency (mean 25.8\,s/spec on our 201-spec runtime subset). A thresholded acceptance policy on the round-trip score exposes a reliability--coverage knob: at

\tau{=}75

, roughly 68\% of items are retained with

\sim

94\% correctness on the accepted set. Validator overhead contributes

<15\%

of end-to-end runtime, and all prompts/responses and timing metadata are logged to enable replay-driven debugging and regression testing. By separating learned translation from symbolic verification and enforcing deterministic, validator-gated acceptance, VeriTrans turns NL

\!\to\!

logic front-ends into auditable, reproducible components for reliability-critical workflows.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

Key Points

Abstract

Related Articles

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer