'The Order in the Horse's Heart': A Case Study in LLM-Assisted Stylometry for the Discovery of Biblical Allusion in Modern Literary Fiction

arXiv cs.CL / 4/22/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a dual-track LLM-assisted stylometry pipeline to detect biblical allusions in modern literary fiction, demonstrated on Cormac McCarthy’s novels.
  • A bottom-up track finds rare shared vocabulary with the King James Bible using an inverse-document-frequency embedding approach, then uses cascaded LLM review for candidate passage pair sense disambiguation.
  • A top-down track uses an LLM to assess McCarthy’s prose for biblical register or content similarity without relying on specific word/phrase cues, aiming to catch more subtle allusions.
  • The pipeline is cross-validated with a long-context model that processes entire novels alongside the KJV, and results are verified against existing scholarship, yielding 349 detected allusions and 62 recovered from 115 previously documented cases.
  • The authors frame the work as showing the statistical value-add of LLMs when paired with mechanical text similarity methods, enabling large-scale computational study of intertextuality across massive corpora.

Abstract

We present a dual-track pipeline for detecting biblical allusions in literary fiction and apply it to the novels of Cormac McCarthy. A bottom-up embedding track uses inverse document frequency to identify rare vocabulary shared with the King James Bible, embeds occurrences in their local context for sense disambiguation, and passes candidate passage pairs through cascaded LLM review. A top-down register track asks an LLM to read McCarthy's prose undirected to any specific biblical passage for comparison, catching allusions not distinguished by word or phrase rarity. Both tracks are cross-validated by a long-context model that holds entire novels alongside the KJV in a single pass, and every finding is checked against published scholarship. Restricting attention to allusions that carry a textual echo--shared phrasing, reworked vocabulary, or transplanted cadence--and distinguishing literary allusions proper from signposted biblical references (similes naming biblical figures, characters overtly citing scripture), the pipeline surfaces 349 allusions across the corpus. Among a target set of 115 previously documented allusions retrieved through human review of the academic literature, the pipeline independently recovers 62 (54% recall), with recall varying by connection type from 30% (transformed imagery) to 80% (register collisions). We contextualise these results with respect to the value-add from LLMs as assistants to mechanical stylometric analyses, and their potential to facilitate the statistical study of intertextuality in massive literary corpora.