NSFW Detection

The NSFW Detection provider identifies sexual, explicit, and adult content that is inappropriate for general audiences. It flags both overt explicit material and suggestive content that crosses professional boundaries.

What it detects

Category	Examples
Sexual content	Explicit sexual descriptions, pornographic material
Suggestive content	Sexually suggestive language, innuendo intended to provoke
Adult solicitation	Requests to generate explicit or adult-only material

Configuration

{
  "policy_type": "nsfw_detection",
  "mode": "blocking",
  "config": {
    "threshold": 0.5
  }
}

Parameter	Type	Default	Description
`threshold`	float	`0.5`	Confidence threshold (0–1). Lower values catch more borderline content.

Example violation

{
  "allowed": false,
  "violations": [
    {
      "policy_type": "nsfw_detection",
      "severity": "high",
      "description": "Explicit sexual content detected",
      "confidence": 0.95
    }
  ]
}

Best practices

Keep the threshold at 0.5 or lower for any public-facing or workplace application.
Always use mode: "blocking" — NSFW content rarely warrants a warning-only approach.
Combine with Toxicity Detection to cover the full spectrum of harmful content.
Test with edge cases relevant to your domain — medical or educational content may require threshold tuning.
Monitor false positives in the dashboard, especially if your application handles health or anatomy topics.

NSFW Detection

NSFW Detection

What it detects

Configuration

Example violation

Best practices

On this page