Improving MPI Error Detection and Repair with Large Language Models and Bug References

arXiv cs.AI / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the difficulty of detecting and repairing MPI (Message Passing Interface) program errors in high-performance computing and distributed training workflows.
  • It argues that directly using LLMs (e.g., ChatGPT) for this task performs poorly because the models lack task-specific knowledge about correct vs. incorrect MPI usage and known bug patterns.
  • The authors propose an LLM-based pipeline that combines Few-Shot Learning, Chain-of-Thought reasoning, and Retrieval Augmented Generation (RAG) with a “bug referencing” technique to improve accuracy.
  • Experiments report a major jump in error detection accuracy from 44% to 77% versus a baseline that uses ChatGPT directly.
  • The bug-referencing approach is shown to generalize beyond the initial model, working well with other large language models.

Abstract

Message Passing Interface (MPI) is a foundational technology in high-performance computing (HPC), widely used for large-scale simulations and distributed training (e.g., in machine learning frameworks such as PyTorch and TensorFlow). However, maintaining MPI programs remains challenging due to their complex interplay among processes and the intricacies of message passing and synchronization. With the advancement of large language models like ChatGPT, it is tempting to adopt such technology for automated error detection and repair. Yet, our studies reveal that directly applying large language models (LLMs) yields suboptimal results, largely because these models lack essential knowledge about correct and incorrect usage, particularly the bugs found in MPI programs. In this paper, we design a bug detection and repair technique alongside Few-Shot Learning (FSL), Chain-of-Thought (CoT) reasoning, and Retrieval Augmented Generation (RAG) techniques in LLMs to enhance the large language model's ability to detect and repair errors. Surprisingly, such enhancements lead to a significant improvement, from 44% to 77%, in error detection accuracy compared to baseline methods that use ChatGPT directly. Additionally, our experiments demonstrate our bug referencing technique generalizes well to other large language models.