Analysis and Explainability of LLMs Via Evolutionary Methods

arXiv stat.ML / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes using evolutionary methods to analyze and explain large language models by mapping model weights to “genotypes” and generated text to “phenotypes.”
  • It argues that this genotype–phenotype correspondence can reveal model lineage, identify the roles of different layers, and clarify how important datasets shape model behavior.
  • In a controlled experiment, the authors show that estimated evolutionary trees can reliably recover the topology of a known ground-truth training tree.
  • The study also estimates which weight layers are most important via weight differences and runs phenotypic experiments to suggest one training dataset contributes more useful information than others.
  • It extends the approach to construct an unsupervised evolutionary tree of black-box foundation models, supported by visualization tools to make relationships among LLMs easier to understand.

Abstract

Evolutionary methods have long been useful for analysis and explanation in genetics, biology, ecology, and related fields. In this work, we extend these methods to neural networks, specifically large language models (LLMs), to better analyze and explain relationships among models. We show how relating weights to genotypes and output text to phenotypes can improve our understanding of model lineage, important datasets, the roles of different model layers, and visualization of model relationships. We demonstrate this in a controlled experiment, where our estimated evolutionary trees reliably recover the topology of the ground-truth training tree. We further identify the most important weight layers according to weight differences and show through phenotypic experiments that one training dataset appears to contribute more useful information than the others. Finally, we generate an unsupervised evolutionary tree of black-box foundation models. Throughout, we provide visualizations that support a clearer understanding of evolutionary relationships among LLMs.