Nyraxis AI

Bias Detection

Detect gender, racial, age, religious, and political bias in LLM inputs and outputs.

Bias Detection

The Bias Detection provider identifies biased language and stereotyping across multiple dimensions. It uses zero-shot classification to detect bias without requiring pre-defined examples for each category.

What it detects

CategoryExamples
Gender biasStereotyping based on gender, sexist assumptions
Racial biasRacial stereotypes, discriminatory generalizations
Age biasAgeist assumptions, age-based discrimination
Religious biasReligious stereotyping, faith-based prejudice
Political biasPartisan framing, politically charged generalizations

Configuration

{
  "policy_type": "bias_detection",
  "mode": "blocking",
  "config": {
    "threshold": 0.5
  }
}
ParameterTypeDefaultDescription
thresholdfloat0.5Confidence threshold (0–1). Lower values increase sensitivity to subtle bias.

Example violation

{
  "allowed": false,
  "violations": [
    {
      "policy_type": "bias_detection",
      "severity": "medium",
      "description": "Gender bias detected: stereotyping based on gender roles",
      "bias_type": "gender",
      "confidence": 0.78
    }
  ]
}

Best practices

  • Deploy in warning mode first to understand your baseline bias detection rate.
  • Use a lower threshold (e.g., 0.3) for HR, recruiting, or public-facing content where bias is high-risk.
  • Combine with Toxicity Detection for comprehensive harmful content coverage.
  • Review flagged outputs to identify systematic bias patterns in your LLM's responses.
  • Consider different thresholds per agent — customer support bots may need stricter controls than internal tools.

On this page