SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs

arXiv cs.CL / 4/27/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper highlights a vulnerability in educational LLMs called “pedagogical jailbreaks,” where students use prompts to force direct answers instead of receiving scaffolded instruction.
  • It formalizes what it means for educational LLMs to be safe, helpful, and pedagogical using a knowledge-mastery graph, aiming to enable systematic evaluation and research.
  • The authors introduce SHAPE, a benchmark containing 9,087 student-question pairs designed to test tutoring behavior under adversarial conditions.
  • They propose a graph-augmented tutoring pipeline that infers prerequisite concepts, detects mastery gaps, and uses explicit gating to choose between instructing and problem-solving.
  • Experiments across multiple LLMs show improved safety against two jailbreak settings while preserving near-top-level helpfulness under the same evaluation protocol, with code and data released publicly.

Abstract

Large Language Models (LLMs) have been widely explored in educational scenarios. We identify a critical vulnerability in current educational LLMs, pedagogical jailbreaks, where students use answer-inducing prompts to elicit solutions rather than scaffolded instructions. To enable systematic study, we unify and formalize safe, helpful, and pedagogical behaviors with a knowledge-mastery graph and introduce SHAPE, a benchmark of 9,087 student-question pairs for evaluating tutoring behavior under adversarial pressure. We propose a graph-augmented tutoring pipeline that infers prerequisite concepts from queries, identifies mastery gaps, and routes generation between instructing and problem-solving via explicit gating. Experiments across multiple LLMs show that our method yields significantly improved safety under two pedagogical jailbreak settings, while maintaining near-ceiling helpfulness under the same evaluation protocol. Our code and data are available at https://github.com/MAPS-research/SHaPE