Does Self-Consistency Improve the Recall of Encyclopedic Knowledge?

arXiv cs.CL / 4/22/2026

📰 NewsModels & Research

Key Points

  • The study investigates whether self-consistency—an approach that samples multiple reasoning paths—improves a model’s recall of encyclopedic knowledge, which had been unclear due to missing evaluation benchmarks.
  • Researchers construct a targeted “knowledge recall” split for the MMLU benchmark using a data-driven heuristic, then validate it by comparing model behavior with GSM8K (symbolic reasoning) and MedMCQA (knowledge recall).
  • With this evaluation setup, self-consistency improves performance on both symbolic reasoning and encyclopedic knowledge recall, even though chain-of-thought prompting is mainly beneficial for symbolic reasoning.
  • The paper reports achieving 89% accuracy on MMLU with self-consistency using GPT-4o, setting a new state of the art for GPT-4o-based results at the time of the report.

Abstract

While self-consistency is known to improve performance on symbolic reasoning, its effect on the recall of encyclopedic knowledge is unclear due to a lack of targeted evaluation grounds. To address this, we establish such a knowledge recall split for the popular MMLU benchmark by applying a data-driven heuristic from prior work. We validate this split by showing that the performance patterns on the symbolic reasoning and knowledge recall subsets mirror those of GSM8K and MedMCQA, respectively. Using this solid ground, we find that self-consistency consistently improves performance across both symbolic reasoning and knowledge recall, even though its underlying CoT prompting is primarily effective for symbolic reasoning. As a result, we achieve an 89\% accuracy on MMLU, the best performance to date with the use of GPT-4o.