Detects psychological manipulation, social engineering, and emotional coercion in prompts.

Unusual Prompt

Unusual Prompt detection identifies social engineering and psychological manipulation techniques in user inputs. It catches attempts to use emotional coercion, urgency fabrication, and authority impersonation to manipulate LLM behavior beyond its intended boundaries.

What it detects

Psychological manipulation tactics
Social engineering patterns (authority impersonation, urgency fabrication)
Emotional coercion ("if you don't help me, something bad will happen")
Guilt-based manipulation
Flattery-based compliance attacks
Threat-based prompt manipulation

Configuration

{
  "policy_type": "unusual_prompt",
  "mode": "blocking",
  "config": {
    "use_llm": true
  }
}

Example violation

{
  "policy_type": "unusual_prompt",
  "severity": "high",
  "description": "Emotional coercion detected in user input",
  "details": {
    "manipulation_type": "emotional_coercion",
    "confidence": 0.88,
    "analysis": "User employs guilt and urgency to override safety guidelines"
  }
}

Best practices

Enable use_llm for higher accuracy on nuanced manipulation attempts
Disable use_llm if latency is a concern — heuristic detection still catches common patterns
Combine with jailbreak detection to cover both technical and social attack vectors
Review flagged prompts periodically to refine detection for your user base

Unusual Prompt

Unusual Prompt

What it detects

Configuration

Example violation

Best practices

On this page