Detect gender, racial, age, religious, and political bias in LLM inputs and outputs.

Bias Detection

The Bias Detection provider identifies biased language and stereotyping across multiple dimensions. It uses zero-shot classification to detect bias without requiring pre-defined examples for each category.

What it detects

Category	Examples
Gender bias	Stereotyping based on gender, sexist assumptions
Racial bias	Racial stereotypes, discriminatory generalizations
Age bias	Ageist assumptions, age-based discrimination
Religious bias	Religious stereotyping, faith-based prejudice
Political bias	Partisan framing, politically charged generalizations

Configuration

{
  "policy_type": "bias_detection",
  "mode": "blocking",
  "config": {
    "threshold": 0.5
  }
}

Parameter	Type	Default	Description
`threshold`	float	`0.5`	Confidence threshold (0–1). Lower values increase sensitivity to subtle bias.

Example violation

{
  "allowed": false,
  "violations": [
    {
      "policy_type": "bias_detection",
      "severity": "medium",
      "description": "Gender bias detected: stereotyping based on gender roles",
      "bias_type": "gender",
      "confidence": 0.78
    }
  ]
}

Best practices

Deploy in warning mode first to understand your baseline bias detection rate.
Use a lower threshold (e.g., 0.3) for HR, recruiting, or public-facing content where bias is high-risk.
Combine with Toxicity Detection for comprehensive harmful content coverage.
Review flagged outputs to identify systematic bias patterns in your LLM's responses.
Consider different thresholds per agent — customer support bots may need stricter controls than internal tools.

Bias Detection

Bias Detection

What it detects

Configuration

Example violation

Best practices

On this page