RoTRAG: Rule of Thumb Reasoning for Conversation Harm Detection with Retrieval-Augmented Generation
arXiv cs.CL / 4/21/2026
📰 NewsModels & Research
Key Points
- RoTRAG is a retrieval-augmented framework for detecting harm in multi-turn dialogues that reasons over full conversational context rather than isolated utterances.
- It grounds LLM-based harm assessment in concise, human-written moral “Rules of Thumb” (RoTs) retrieved from an external corpus, improving consistency and interpretability.
- The system performs turn-level reasoning and final severity classification using the retrieved normative evidence instead of relying only on parametric knowledge.
- To reduce cost, RoTRAG includes a lightweight binary routing classifier that determines whether a turn needs retrieval-based reasoning or can reuse existing context.
- Experiments on ProsocialDialog and Safety Reasoning Multi Turn Dialogue show about a 40% average relative F1 improvement and an 8.4% average relative reduction in distributional error, while lowering redundant computation.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA
Where is Grok-2 Mini and Grok-3 (mini)?
Reddit r/LocalLLaMA