PrismaDV: Automated Task-Aware Data Unit Test Generation

arXiv cs.LG / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

PrismaDV is introduced as an AI system that generates data unit tests by jointly analyzing downstream task code and dataset profiles rather than treating validation as task-agnostic.
The approach identifies data access patterns, infers implicit assumptions in the consuming code, and produces executable unit tests that better capture end-to-end effects of data errors.
PrismaDV also proposes SIFTA (Selective Informative Feedback for Task Adaptation), a prompt-optimization framework that adapts task-aware tests over time using sparse feedback from test and downstream execution outcomes.
In evaluations on two new benchmarks covering 60 tasks across five datasets, PrismaDV consistently outperforms both task-agnostic and task-aware baseline methods for generating more realistic unit tests.
The authors release the benchmarks and a prototype implementation, and show that SIFTA can learn module prompts that beat hand-written and generally optimized prompts.

Abstract

Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing automated data unit testing frameworks are largely task-agnostic: they validate datasets without considering the semantics and requirements of the code that consumes the data. We present PrismaDV, a compound AI system that analyzes downstream task code together with dataset profiles to identify data access patterns, infer implicit data assumptions, and generate task-aware executable data unit tests. To further adapt the data unit tests over time to specific datasets and downstream tasks, we propose "Selective Informative Feedback for Task Adaptation" (SIFTA), a prompt-optimization framework that leverages the scarce outcomes from the execution of data unit tests and downstream tasks. We evaluate PrismaDV on two new benchmarks spanning 60 tasks across five datasets, where it consistently outperforms both task-agnostic and task-aware baselines in generating unit tests that reflect the end-to-end impact of data errors. Furthermore, we show that with SIFTA, we can automatically learn prompts for PrismaDV's modules that outperform prompts written by hand or generated from a generic prompt optimizer. We publicly release our benchmarks and prototype implementation.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/24DailyView insight →

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Dev.to

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

Reddit r/LocalLLaMA

PrismaDV: Automated Task-Aware Data Unit Test Generation

Key Points

Abstract

💡 Insights using this article

Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer