Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning

arXiv cs.AI / 4/22/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Mutualistic Neural Active Learning (MNAL), a cross-project framework for automatically identifying GitHub bug reports using human-machine collaboration.
  • MNAL trains a neural language model to generalize bug report patterns across different projects and uses active learning to decide what data to label next.
  • A key contribution is a mutualistic strategy between developers (human labelers) and the model: it selects the most informative human-labeled reports and pairs them with pseudo-labeled examples to improve learning while presenting more readable, identifiable reports to humans.
  • Experiments on a large-scale dataset show MNAL can reach up to 95.8% effort reduction for readability and 196.0% effort reduction for identifiability during human labeling, while also improving bug report identification accuracy versus state-of-the-art baselines.
  • The approach is model-agnostic and demonstrated effectiveness not only quantitatively but also via a qualitative user study with 10 participants who reported time and cost savings.

Abstract

Bug reports, encompassing a wide range of bug types, are crucial for maintaining software quality. However, the increasing complexity and volume of bug reports pose a significant challenge in sole manual identification and assignment to the appropriate teams for resolution, as dealing with all the reports is time-consuming and resource-intensive. In this paper, we introduce a cross-project framework, dubbed Mutualistic Neural Active Learning (MNAL), designed for automated and more effective identification of bug reports from GitHub repositories boosted by human-machine collaboration. MNAL utilizes a neural language model that learns and generalizes reports across different projects, coupled with active learning to form neural active learning. A distinctive feature of MNAL is the purposely crafted mutualistic relation between the machine learners (neural language model) and human labelers (developers) when enriching the knowledge learned. That is, the most informative human-labeled reports and their corresponding pseudo-labeled ones are used to update the model while those reports that need to be labeled by developers are more readable and identifiable, thereby enhancing the human-machine teaming therein. We evaluate MNAL using a large scale dataset against the SOTA approaches, baselines, and different variants. The results indicate that MNAL achieves up to 95.8% and 196.0% effort reduction in terms of readability and identifiability during human labeling, respectively, while resulting in a better performance in bug report identification. Additionally, our MNAL is model-agnostic since it is capable of improving the model performance with various underlying neural language models. To further verify the efficacy of our approach, we conducted a qualitative case study involving 10 human participants, who rate MNAL as being more effective while saving more time and monetary resources.