What happens to a flagged message?

It is withheld before it reaches the model or gets stored. The user receives a polite, on-brand refusal instead.

How reliable is the verdict?

The verdict is calibrated and produced by the provider Mistral. Optionally, you set your own custom sensitivity threshold.

Is moderation GDPR-compliant?

Yes. Kyros is hosted in the EU, and because flagged messages are withheld before storage, they never cross your boundary in the first place.

Content moderation

Screen every message before it reaches the model.

Q: Are there more categories?

Yes, via opt-in: health, financial and legal advice, plus jailbreak attempts, can be enabled on top.

Moderation screens incoming messages and withholds anything flagged — before the model and before storage — with a polite, on-brand refusal.

Start for free See how it works

What moderation screens

Protection that acts before it’s too late.

Six categories on by default

PII, sexual content, hate & discrimination, violence & threats, dangerous & criminal content and self-harm are on by default.

Opt-in categories

Health, financial and legal advice, plus jailbreak attempts, can be enabled on top.

Withheld before model & storage

Flagged messages are withheld before they reach the model or get stored.

Polite, on-brand refusal

Instead of a hard error, the user gets a polite refusal that matches your brand.

Calibrated verdict

The verdict is calibrated (provider: Mistral) — with an optional custom sensitivity threshold.

Before processing

The filter acts on the way in — not just in the reply — protecting the model, storage and your brand.

How to set up moderation.

The six default categories are on out of the box — nothing to do.
Choose opt-in categories: health, financial, legal advice and jailbreak attempts.
Optionally set a custom sensitivity threshold for the calibrated verdict.
Tune the refusal text to your brand and test live — flagged content is withheld upfront.

Frequently asked questions

Six categories are on out of the box: PII, sexual content, hate & discrimination, violence & threats, dangerous & criminal content and self-harm.

Security

EU hosting, encryption and audit log.

GDPR

DPA, TOMs and data processing.

Chat widget

Moderation applies inside every embeddable chat.

Pricing

Transparent credits, Starter free forever.

Build an assistant you can trust live.

14-day free trial. No credit card. German & English.

Start for free See how it works

Content moderation

Screen every message before it reaches the model.

Moderation screens incoming messages and withholds anything flagged — before the model and before storage — with a polite, on-brand refusal.

Start for free See how it works

What moderation screens

Protection that acts before it’s too late.

Six categories on by default

PII, sexual content, hate & discrimination, violence & threats, dangerous & criminal content and self-harm are on by default.

Opt-in categories

Health, financial and legal advice, plus jailbreak attempts, can be enabled on top.

Withheld before model & storage

Flagged messages are withheld before they reach the model or get stored.

Polite, on-brand refusal

Instead of a hard error, the user gets a polite refusal that matches your brand.

Calibrated verdict

The verdict is calibrated (provider: Mistral) — with an optional custom sensitivity threshold.

Before processing

The filter acts on the way in — not just in the reply — protecting the model, storage and your brand.

How to set up moderation.

The six default categories are on out of the box — nothing to do.
Choose opt-in categories: health, financial, legal advice and jailbreak attempts.
Optionally set a custom sensitivity threshold for the calibrated verdict.
Tune the refusal text to your brand and test live — flagged content is withheld upfront.

Frequently asked questions

Six categories are on out of the box: PII, sexual content, hate & discrimination, violence & threats, dangerous & criminal content and self-harm.

Build an assistant you can trust live.

14-day free trial. No credit card. German & English.

Start for free See how it works

Screen every message before it reaches the model.

Protection that acts before it’s too late.

Six categories on by default

Opt-in categories

Withheld before model & storage

Polite, on-brand refusal

Calibrated verdict

Before processing

How to set up moderation.

Frequently asked questions

Related

Security

GDPR

Chat widget

Pricing

Build an assistant you can trust live.

Screen every message before it reaches the model.

Protection that acts before it’s too late.

Six categories on by default

Opt-in categories

Withheld before model & storage

Polite, on-brand refusal

Calibrated verdict

Before processing

How to set up moderation.

Frequently asked questions

Related

Security

GDPR

Chat widget

Pricing

Build an assistant you can trust live.