BLAST: Benchmarking LLMs with ASP-based Structured Testing

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces BLAST, the first dedicated benchmarking methodology and dataset for evaluating how accurately LLMs generate Answer Set Programming (ASP) code.
BLAST uses a structured evaluation framework that includes two new semantic metrics specifically designed to assess ASP code generation quality.
The authors report an empirical study testing eight state-of-the-art LLMs across ten well-known graph-related ASP problems from the ASP literature.
The work highlights a research gap: while LLMs are strong on many tasks, their effectiveness for declarative paradigms like ASP has received relatively less attention so far.
Results are presented as an initial evaluation using graph-centric ASP benchmarks, aiming to enable more rigorous and comparable future assessments of LLM-to-ASP generation.
Point 2
Point 3

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across a broad spectrum of tasks, including natural language understanding, dialogue systems, and code generation. Despite evident progress, less attention has been paid to their effectiveness in handling declarative paradigms such as Answer Set Programming (ASP), to date. In this paper we introduce BLAST: The first dedicated benchmarking methodology and associated dataset for evaluating the accuracy of LLMs in generating ASP code. BLAST provides a structured evaluation framework featuring two novel semantic metrics tailored to ASP code generation. The paper presents the results of an empirical evaluation involving ten well-established graph-related problems from the ASP literature and a diverse set of eight state-of-the-art LLMs.

Subagents: The Building Block of Agentic AI

Dev.to

DeepSeek-V4 Models Could Change Global AI Race

AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch

Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems

Dev.to

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why

Dev.to

BLAST: Benchmarking LLMs with ASP-based Structured Testing

Key Points

Abstract

Related Articles

Subagents: The Building Block of Agentic AI

DeepSeek-V4 Models Could Change Global AI Race

Got OpenAI's privacy filter model running on-device via ExecuTorch

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer