SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs

arXiv cs.CL / 4/27/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper highlights a vulnerability in educational LLMs called “pedagogical jailbreaks,” where students use prompts to force direct answers instead of receiving scaffolded instruction.
It formalizes what it means for educational LLMs to be safe, helpful, and pedagogical using a knowledge-mastery graph, aiming to enable systematic evaluation and research.
The authors introduce SHAPE, a benchmark containing 9,087 student-question pairs designed to test tutoring behavior under adversarial conditions.
They propose a graph-augmented tutoring pipeline that infers prerequisite concepts, detects mastery gaps, and uses explicit gating to choose between instructing and problem-solving.
Experiments across multiple LLMs show improved safety against two jailbreak settings while preserving near-top-level helpfulness under the same evaluation protocol, with code and data released publicly.

Abstract

Large Language Models (LLMs) have been widely explored in educational scenarios. We identify a critical vulnerability in current educational LLMs, pedagogical jailbreaks, where students use answer-inducing prompts to elicit solutions rather than scaffolded instructions. To enable systematic study, we unify and formalize safe, helpful, and pedagogical behaviors with a knowledge-mastery graph and introduce SHAPE, a benchmark of 9,087 student-question pairs for evaluating tutoring behavior under adversarial pressure. We propose a graph-augmented tutoring pipeline that infers prerequisite concepts from queries, identifies mastery gaps, and routes generation between instructing and problem-solving via explicit gating. Experiments across multiple LLMs show that our method yields significantly improved safety under two pedagogical jailbreak settings, while maintaining near-ceiling helpfulness under the same evaluation protocol. Our code and data are available at https://github.com/MAPS-research/SHaPE