Content moderation
Screen every message before it reaches the model.
Moderation screens incoming messages and withholds anything flagged — before the model and before storage — with a polite, on-brand refusal.
What moderation screens
Protection that acts before it’s too late.
Six categories on by default
PII, sexual content, hate & discrimination, violence & threats, dangerous & criminal content and self-harm are on by default.
Opt-in categories
Health, financial and legal advice, plus jailbreak attempts, can be enabled on top.
Withheld before model & storage
Flagged messages are withheld before they reach the model or get stored.
Polite, on-brand refusal
Instead of a hard error, the user gets a polite refusal that matches your brand.
Calibrated verdict
The verdict is calibrated (provider: Mistral) — with an optional custom sensitivity threshold.
Before processing
The filter acts on the way in — not just in the reply — protecting the model, storage and your brand.
How to set up moderation.
- The six default categories are on out of the box — nothing to do.
- Choose opt-in categories: health, financial, legal advice and jailbreak attempts.
- Optionally set a custom sensitivity threshold for the calibrated verdict.
- Tune the refusal text to your brand and test live — flagged content is withheld upfront.
Frequently asked questions
Build an assistant you can trust live.
14-day free trial. No credit card. German & English.