MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator

arXiv cs.CL / 4/13/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • MuTSE is presented as a human-in-the-loop, interactive web application for evaluating LLM-generated text simplifications across arbitrary CEFR targets.
  • The system enables systematic testing of prompt-model permutations by running concurrent P×M combinations and producing a real-time comparison matrix.
  • It includes a tiered semantic alignment engine with a linearity bias heuristic (λ) and uses visual mapping to connect source sentences with their simplified outputs.
  • The authors position MuTSE as reducing the cognitive burden of qualitative comparison and improving reproducibility for downstream NLP dataset construction.
  • Code and a demo are made available for peer review via an OSF link, supporting adoption and evaluation by other researchers.

Abstract

As Large Language Models (LLMs) become increasingly prevalent in text simplification, systematically evaluating their outputs across diverse prompting strategies and architectures remains a critical methodological challenge in both NLP research and Intelligent Tutoring Systems (ITS). Developing robust prompts is often hindered by the absence of structured, visual frameworks for comparative text analysis. While researchers typically rely on static computational scripts, educators are constrained to standard conversational interfaces -- neither paradigm supports systematic multi-dimensional evaluation of prompt-model permutations. To address these limitations, we introduce \textbf{MuTSE}\footnote{The project code and the demo have been made available for peer review at the following anonymized URL. https://osf.io/njs43/overview?view_only=4b4655789f484110a942ebb7788cdf2a, an interactive human-in-the-loop web application designed to streamline the evaluation of LLM-generated text simplifications across arbitrary CEFR proficiency targets. The system supports concurrent execution of P \times M prompt-model permutations, generating a comprehensive comparison matrix in real-time. By integrating a novel tiered semantic alignment engine augmented with a linearity bias heuristic (\lambda), MuTSE visually maps source sentences to their simplified counterparts, reducing the cognitive load associated with qualitative analysis and enabling reproducible, structured annotation for downstream NLP dataset construction.