Issue #129: Mistral helps detect undesirable text content

Mistral launches new Moderation API

Nov 07, 2024

Welcome to Issue #129 of One Minute AI, your daily AI news companion. This issue discusses a recent announcement from Mistral.

Mistral launches new Moderation API

Mistral AI has introduced a content moderation API to enhance safety in AI applications. This service, integral to their Le Chat platform, enables users to detect undesirable text across nine policy categories, including unqualified advice and personally identifiable information (PII). The API offers two endpoints: one for raw text and another for conversational content, both supporting multiple languages such as Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

The moderation classifier is designed to provide effective guardrails by addressing potential model-generated harms. Mistral AI has shared performance metrics, including Area Under the Precision-Recall Curve (AUC PR) across various policies, demonstrating the classifier's efficacy. The company is collaborating with customers to develop scalable, lightweight, and customizable moderation tools and is actively engaging with the research community to advance safety measures in the AI field.

Read the official announcement

Want to help?

If you liked this issue, help spread the word and share One Minute AI with your peers and community.

Share One Minute AI

You can also share feedback with us, as well as news from the AI world that you’d like to see featured by joining our chat on Substack.

Join Team One Minute AI’s subscriber chat

Available in the Substack app and on web

One Minute AI

Discussion about this post

Ready for more?