From Untestable to Testable: Metamorphic Testing in the Age of LLMs

arXiv cs.AI / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines why software testing becomes harder as AI and LLM components are integrated into systems and behave with limited reliability.
  • It highlights a key bottleneck: creating labeled ground-truth test oracles does not scale for LLM-driven functionality.
  • It proposes using metamorphic testing, which derives executable test oracles from expected relations across multiple executions rather than fixed labels.
  • The work frames metamorphic testing as a practical way to improve testability of LLM-influenced outputs by focusing on transformation-invariant or relation-based properties.

Abstract

This article discusses the challenges of testing software systems with increasingly integrated AI and LLM functionalities. LLMs are powerful but unreliable, and labeled ground truth for testing rarely scales. Metamorphic Testing solves this by turning relations among multiple test executions into executable test oracles.
広告