Call-Chain-Aware LLM-Based Test Generation for Java Projects

arXiv cs.AI / 4/27/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The paper introduces CAT, a call-chain-aware LLM-based method for generating Java unit tests using static analysis to add call-chain and dependency context to prompts.
  • CAT goes beyond execution-path-only prompting by modeling caller–callee relationships, object constructors, and third-party dependencies to help produce executable and semantically valid test contexts.
  • It includes an iterative test-fixing mechanism to recover from generation failures, improving robustness when tests initially cannot run.
  • On the Defects4J benchmark, CAT raises line coverage by 18.04% and branch coverage by 21.74% compared with the state-of-the-art approach PANTA.
  • CAT also performs better than the prior approach on four real-world GitHub projects released after the LLM cutoff date, and an ablation study confirms the value of call-chain and dependency contexts.

Abstract

Large language models (LLMs) have recently shown strong potential for generating project-level unit tests. However, existing state-of-the-art approaches primarily rely on execution-path information to guide prompt construction, which is often insufficient for complex software systems with rich inter-class dependencies, deep call chains, and intricate object initialization requirements. In this paper, we present CAT, a novel call-chain-aware LLM-based test generation approach that explicitly incorporates call-chain and dependency contexts into prompts through dedicated static analysis. To construct executable, semantically valid test contexts, CAT systematically models caller--callee relationships, object constructors, and third-party dependencies, and supports iterative test fixing when generation failures occur. We evaluate CAT on the widely used Defects4J benchmark and on four real-world GitHub projects released after the LLM's cut-off date. The results show that, across projects in Defects4J, CAT improves line and branch coverage by 18.04% and 21.74%, respectively, over the state-of-the-art approach PANTA, while consistently achieving superior performance on post-cutoff real-world projects. An ablation study further demonstrates the importance of call-chain and dependency contexts in CAT.