LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

arXiv cs.CL / 3/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The survey uses a data-driven, semi-automated approach to review limitations of LLMs (LLLMs) from 2022 to early 2025, analyzing a corpus of 250,000 ACL and arXiv papers with keyword filtering, LLM-based classification, expert validation, and topic clustering (HDBSCAN+BERTopic and LlooM).
It reports that the share of LLM-related papers has grown fivefold in ACL and eightfold in arXiv since 2022, with LLLMs making up over 30% of LLM papers by 2025.
Reasoning is the most studied limitation, followed by generalization, hallucination, bias, and security, and the arXiv dataset shows shifting emphasis toward security risks, alignment, hallucinations, knowledge editing, and multimodality.
The authors release a dataset of annotated abstracts and a validated methodology publicly on GitHub, enabling reproducibility and further research.

Abstract

Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations. In this survey, we conduct a data-driven, semi-automated review of research on limitations of LLMs (LLLMs) from 2022 to early 2025 using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we identify 14,648 relevant papers using keyword filtering, LLM-based classification, validated against expert labels, and topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM). We find that the share of LLM-related papers increases over fivefold in ACL and nearly eightfold in arXiv between 2022 and 2025. Since 2022, LLLMs research grows even faster, reaching over 30% of LLM papers by 2025. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward security risks, alignment, hallucinations, knowledge editing, and multimodality. We offer a quantitative view of trends in LLLMs research and release a dataset of annotated abstracts and a validated methodology, available at: https://github.com/a-kostikova/LLLMs-Survey.

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

From Early Adopter to AI Instructor: Teaching 500 Engineers to Build with LLMs

Dev.to

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

Key Points

Abstract

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

From Early Adopter to AI Instructor: Teaching 500 Engineers to Build with LLMs

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer