Detect self-harm, drugs, violence, and terrorism content in LLM inputs and outputs.

Sensitive Topics

The Sensitive Topics provider identifies content related to dangerous or highly sensitive subject matter that most applications should not engage with. It covers categories where LLM responses could cause real-world harm.

What it detects

Category	Examples
Self-harm	Suicide methods, self-injury encouragement, pro-anorexia content
Drugs	Drug manufacturing instructions, substance abuse promotion
Violence	Graphic violence descriptions, instructions for causing harm
Terrorism	Radicalization content, attack planning, extremist propaganda

Configuration

{
  "policy_type": "sensitive_topics",
  "mode": "blocking",
  "config": {
    "threshold": 0.5
  }
}

Parameter	Type	Default	Description
`threshold`	float	`0.5`	Confidence threshold (0–1). Lower values increase sensitivity to borderline content.

Example violation

{
  "allowed": false,
  "violations": [
    {
      "policy_type": "sensitive_topics",
      "severity": "critical",
      "description": "Self-harm content detected: discussion of harmful methods",
      "topic": "self_harm",
      "confidence": 0.88
    }
  ]
}

Best practices

Use mode: "blocking" for all sensitive topic categories — these are high-risk by nature.
Set a lower threshold (e.g., 0.3) for self-harm detection where false negatives carry serious consequences.
Combine with Toxicity Detection for full content safety coverage.
Ensure your application provides appropriate crisis resources (e.g., helpline numbers) when self-harm content is detected.
Review flagged content regularly to ensure legitimate educational or news discussions are not over-blocked.

Sensitive Topics

Sensitive Topics

What it detects

Configuration

Example violation

Best practices

On this page