TDD Governance for Multi-Agent Code Generation via Prompt Engineering

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that while LLMs speed up software development, they can be unstable and non-deterministic and often fail to follow disciplined engineering workflows in unconstrained settings.
  • It proposes an AI-native TDD framework that turns classical TDD (Red-Green-Refactor) into enforceable governance using structured prompt-level and workflow-level constraints.
  • The approach uses a machine-readable “manifesto” of extracted principles and applies them across planning, code generation, repair, and validation stages in a layered architecture.
  • It improves stability and reproducibility by enforcing phase ordering, limiting repair-loop iterations, adding validation gates, and controlling atomic code mutations via a deterministic authority layer.
  • The authors present architecture details and suggest that embedding software-engineering discipline into prompt orchestration could enable more reliable LLM-assisted development.

Abstract

Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured Red-Green-Refactor process, existing LLM-based approaches typically use tests as auxiliary inputs rather than enforceable process constraints. We present an AI-native TDD framework that operationalizes classical TDD principles as structured prompt-level and workflow-level governance mechanisms. Extracted principles are formalized in a machine-readable manifesto and distributed across planning, generation, repair, and validation stages within a layered architecture that separates model proposal from deterministic engine authority. The system enforces phase ordering, bounded repair loops, validation gates, and atomic mutation control to improve stability and reproducibility. We describe architecture and discuss encoding software engineering discipline directly into prompt orchestration, which we think offers a promising direction for reliable LLM-assisted development.