Demystifying the Silence of Correctness Bugs in PyTorch Compiler
arXiv cs.AI / 4/13/2026
📰 News
Key Points
- The paper argues that PyTorch’s torch.compile can produce silent correctness bugs—incorrect model outputs without exceptions or warnings—posing reliability risks for downstream LLM applications.
- Community data cited shows incorrect-output correctness bugs account for 19.2% of high-priority torch.compile issues, making them the second most common category after crashes.
- It presents the first empirical characterization of torch.compile correctness bugs, analyzes their key characteristics, and evaluates how well existing fuzzers detect them.
- The authors introduce AlignGuard, a proof-of-concept test technique that uses bug-characteristic-guided, LLM-based test mutation to improve detection of silent correctness failures.
- AlignGuard has reportedly found 23 previously unknown correctness bugs in recent torch.compile versions, with all confirmed/fixed by PyTorch and more than half labeled high-priority.
- categories: [