Nyraxis AI

Prompt Injection

Detects instruction override and system prompt extraction attempts using multi-layer analysis.

Prompt Injection

Prompt Injection detection identifies attempts to override your system instructions or extract your system prompt. Using multi-layer detection combining heuristic patterns and ML classification, it catches both known attack templates and novel injection techniques before they reach your LLM.

What it detects

  • Instruction override attempts ("ignore previous instructions")
  • System prompt extraction ("repeat your system prompt")
  • Role reassignment attacks ("you are now a different AI")
  • Delimiter injection (fake system message boundaries)
  • Indirect injection via embedded content
  • Encoded or obfuscated injection payloads

Configuration

{
  "policy_type": "prompt_injection",
  "mode": "blocking",
  "config": {
    "detection_mode": "thorough",
    "threshold": 0.80
  }
}

Example violation

{
  "policy_type": "prompt_injection",
  "severity": "high",
  "description": "Instruction override attempt detected in user input",
  "details": {
    "attack_type": "instruction_override",
    "confidence": 0.94,
    "detection_layer": "ml_classifier"
  }
}

Best practices

  • Use thorough mode for production systems handling untrusted user input
  • Use fast mode for low-latency applications where speed is critical
  • Set threshold lower (0.70) during initial deployment to catch more attempts, then tune upward
  • Combine with jailbreak detection for comprehensive prompt-level protection

On this page