An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations

arXiv cs.CL / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper empirically finds that LLMs frequently hallucinate library usage in NL-to-code tasks, producing references to non-existent library features in about 8.1% to 40% of responses.
It evaluates static analysis tools for detection and mitigation, reporting that they can detect roughly 16% to 70% of general errors and about 14% to 85% of library hallucinations, with results dependent on both the LLM and the dataset.
Manual investigation shows there are hallucination cases that static analysis is unlikely to catch, yielding an estimated upper bound of detectability/mitigation between 48.5% and 77%.
Overall, the study concludes static analysis is a relatively low-cost partial remedy for code library hallucinations, but it cannot fully solve the broader hallucination problem.

Abstract

Despite extensive research, Large Language Models continue to hallucinate when generating code, particularly when using libraries. On NL-to-code benchmarks that require library use, we find that LLMs generate code that uses non-existent library features in 8.1-40% of responses.One intuitive approach for detection and mitigation of hallucinations is static analysis. In this paper, we analyse the potential of static analysis tools, both in terms of what they can solve and what they cannot. We find that static analysis tools can detect 16-70% of all errors, and 14-85% of library hallucinations, with performance varying by LLM and dataset. Through manual analysis, we identify cases a static method could not plausibly catch, which gives an upper bound on their potential from 48.5% to 77%. Overall, we show that static analysis methods are cheap method for addressing some forms of hallucination, and we quantify how far short of solving the problem they will always be.

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

Dev.to

How To Optimize Enterprise AI Energy Consumption

Dev.to

An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations

Key Points

Abstract

Related Articles

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

How To Optimize Enterprise AI Energy Consumption

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer