Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper evaluates how large language models perform on multi-instance processing (MIP) tasks where the model must handle many related inputs and then produce an aggregated result.
  • Experiments show a consistent failure mode: performance slightly degrades when instance counts are small (about 20–100), then sharply collapses as the number of instances increases.
  • Although context length correlates with the degradation, the analysis finds that instance count has a stronger impact on the final performance outcomes.
  • The authors conclude that optimization for MIP should focus on controlling instance count (and secondarily context length) to avoid the observed collapse at higher counts.

Abstract

Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM to process the sentiment of each review individually in order to provide a final aggregated answer. While LLM performance on such individual tasks is generally high, there has been little research on how LLMs perform when dealing with multi-instance inputs. In this paper, we perform a comprehensive evaluation of the multi-instance processing (MIP) ability of LLMs for tasks in which they excel individually. The results show that all LLMs follow a pattern of slight performance degradation for small numbers of instances (approximately 20-100), followed by a performance collapse on larger instance counts. Crucially, our analysis shows that while context length is associated with this degradation, the number of instances has a stronger effect on the final results. This finding suggests that when optimising LLM performance for MIP, attention should be paid to both context length and, in particular, instance count.