NSFW Detection
Detect sexual and explicit content in LLM inputs and outputs.
NSFW Detection
The NSFW Detection provider identifies sexual, explicit, and adult content that is inappropriate for general audiences. It flags both overt explicit material and suggestive content that crosses professional boundaries.
What it detects
| Category | Examples |
|---|---|
| Sexual content | Explicit sexual descriptions, pornographic material |
| Suggestive content | Sexually suggestive language, innuendo intended to provoke |
| Adult solicitation | Requests to generate explicit or adult-only material |
Configuration
{
"policy_type": "nsfw_detection",
"mode": "blocking",
"config": {
"threshold": 0.5
}
}| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.5 | Confidence threshold (0–1). Lower values catch more borderline content. |
Example violation
{
"allowed": false,
"violations": [
{
"policy_type": "nsfw_detection",
"severity": "high",
"description": "Explicit sexual content detected",
"confidence": 0.95
}
]
}Best practices
- Keep the threshold at
0.5or lower for any public-facing or workplace application. - Always use
mode: "blocking"— NSFW content rarely warrants a warning-only approach. - Combine with Toxicity Detection to cover the full spectrum of harmful content.
- Test with edge cases relevant to your domain — medical or educational content may require threshold tuning.
- Monitor false positives in the dashboard, especially if your application handles health or anatomy topics.