Sensitive Topics
Detect self-harm, drugs, violence, and terrorism content in LLM inputs and outputs.
Sensitive Topics
The Sensitive Topics provider identifies content related to dangerous or highly sensitive subject matter that most applications should not engage with. It covers categories where LLM responses could cause real-world harm.
What it detects
| Category | Examples |
|---|---|
| Self-harm | Suicide methods, self-injury encouragement, pro-anorexia content |
| Drugs | Drug manufacturing instructions, substance abuse promotion |
| Violence | Graphic violence descriptions, instructions for causing harm |
| Terrorism | Radicalization content, attack planning, extremist propaganda |
Configuration
{
"policy_type": "sensitive_topics",
"mode": "blocking",
"config": {
"threshold": 0.5
}
}| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.5 | Confidence threshold (0–1). Lower values increase sensitivity to borderline content. |
Example violation
{
"allowed": false,
"violations": [
{
"policy_type": "sensitive_topics",
"severity": "critical",
"description": "Self-harm content detected: discussion of harmful methods",
"topic": "self_harm",
"confidence": 0.88
}
]
}Best practices
- Use
mode: "blocking"for all sensitive topic categories — these are high-risk by nature. - Set a lower threshold (e.g.,
0.3) for self-harm detection where false negatives carry serious consequences. - Combine with Toxicity Detection for full content safety coverage.
- Ensure your application provides appropriate crisis resources (e.g., helpline numbers) when self-harm content is detected.
- Review flagged content regularly to ensure legitimate educational or news discussions are not over-blocked.