Ethics Testing: Proactive Identification of Generative AI System Harms

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper highlights that generative AI systems, while popular due to tools like ChatGPT, can produce harmful or policy-violating content that causes serious downstream consequences.
  • It argues that current approaches to testing quality and safety—such as fairness testing—do not provide a systematic way to generate tests for detecting software harms in automatically generated outputs.
  • The authors introduce “ethics testing” as a new concept focused on systematically generating tests to identify harms triggered by unethical behavior, including harmful actions and intellectual property-rights violations.
  • The article discusses key challenges in designing and applying ethics testing, and demonstrates its feasibility through five case studies for generative AI systems.

Abstract

Generative Artificial Intelligence (GAI) systems that can automatically generate content in the form of source code or other contents (e.g., images) has seen increasing popularity due to the emergence of tools such as ChatGPT which rely on Large Language Models (LLMs). Misuse of the automatically generated content can incur serious consequences due to potential harms in the generated content. Despite the importance of ensuring the quality of automatically generated content, there is little to no approach that can systematically generate tests for identifying software harms in the content generated by these GAI systems. In this article, we introduce the novel concept of ethics testing which aims to systematically generate tests for identifying software harms. Different from existing testing methodologies (e.g., fairness testing that aims to identifying software discrimination), ethics testing aims to systematically detect software harms that could be induced due to unethical behavior (e.g., harmful behavior or behavior that violates intellectual property rights) in automatically generated content. We introduced the concept of ethics testing, discussed the challenges therewithin, and conducted five case studies to show how ethics testing can be performed for generative AI systems.