Research Mistral Moderation API

Mistral AI Blog / 5/28/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

Mistral AI has introduced a new “Moderation API” feature designed to help developers filter and assess potentially unsafe or policy-violating content in generated outputs.
The API is positioned as an easier way to add moderation capabilities to applications that use Mistral models, reducing the need to build custom safety pipelines.
The release targets practical deployment needs by integrating moderation as an API workflow rather than requiring model retraining or complex post-processing.
By providing standardized moderation checks, the Moderation API aims to improve compliance and safety handling across different application use cases.

Back to Blog

2 min read

Blog

Research

Mistral Moderation API

November 7, 2024

Mistral AI team

Back to Blog

2 min read

Share this post

Safety plays a key role in making AI useful. At Mistral AI, we believe that system level guardrails are critical to protecting downstream deployments.That's why we are releasing a new content moderation API. It is the same API that powers the moderation service in Le Chat. We are launching it to empower our users to utilize and tailor this tool to their specific applications and safety standards.

Over the past few months, we've seen growing enthusiasm across the industry and research community for new LLM based moderation systems, which can help make moderation more scalable and robust across applications. Our model is an LLM classifier trained to classify text inputs into 9 categories defined below. We are releasing two end-points: one for raw text and one for conversational content. Undesirable content is very specific to a given context, therefore we've trained our model to classify the last message of conversation within a conversational context. Check out our technical documentation for more information. The model is natively multilingual and in particular trained on Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish.

The Content Moderation classifier leverages the most relevant policy categories for effective guardrails and introduces a pragmatic approach to LLM safety by addressing model-generated harms such as unqualified advice and PII. The full set of policy definitions and details on how to get started are available in our technical documentation .

Performance

We are sharing AUC PR across policies on our internal testset below.

We're working with our customers to build and share scalable, lightweight and customizable moderation tooling, and will continue to engage with the research community to contribute safety advancements to the broader field.

Black Hat USA

AI Business

YouTube adds new podcast features, including an AI recommendation tool and ‘Auto speed’

TechCrunch

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Reddit r/MachineLearning

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B

Reddit r/LocalLLaMA

AiFinPay: Autonomous Payments for ruvnet/ruflo

Dev.to

Research Mistral Moderation API

Key Points

Mistral Moderation API

Performance

Related Articles

Black Hat USA

YouTube adds new podcast features, including an AI recommendation tool and ‘Auto speed’

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B

AiFinPay: Autonomous Payments for ruvnet/ruflo

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer