LLMbench: A Comparative Close Reading Workbench for Large Language Models
arXiv cs.AI / 4/20/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- LLMbench is introduced as a browser-based workbench that enables side-by-side, close reading of large language model (LLM) outputs rather than focusing mainly on numerical rating metrics.
- The system adds four analytical overlays—token-level log-probability inspection, word-level differences, Hyland-style tone/metadiscourse analysis, and sentence-level structure with discourse connective highlighting.
- It includes six analytical modes (e.g., stochastic variation, temperature gradients, prompt sensitivity, token probabilities, and cross-model divergence) to make the probabilistic structure behind generation more interpretable at the token level.
- The tool visualizes outputs as probability distributions (using heatmaps, entropy sparklines, pixel maps, and 3D probability “terrains”) to reveal counterfactual histories of how each word could have emerged.
- The paper argues that log-probability data—currently underused in humanities and social-science readings—should be treated as a valuable resource for critical study of generative AI models.



