Bias Detection
Detect gender, racial, age, religious, and political bias in LLM inputs and outputs.
Bias Detection
The Bias Detection provider identifies biased language and stereotyping across multiple dimensions. It uses zero-shot classification to detect bias without requiring pre-defined examples for each category.
What it detects
| Category | Examples |
|---|---|
| Gender bias | Stereotyping based on gender, sexist assumptions |
| Racial bias | Racial stereotypes, discriminatory generalizations |
| Age bias | Ageist assumptions, age-based discrimination |
| Religious bias | Religious stereotyping, faith-based prejudice |
| Political bias | Partisan framing, politically charged generalizations |
Configuration
{
"policy_type": "bias_detection",
"mode": "blocking",
"config": {
"threshold": 0.5
}
}| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 0.5 | Confidence threshold (0–1). Lower values increase sensitivity to subtle bias. |
Example violation
{
"allowed": false,
"violations": [
{
"policy_type": "bias_detection",
"severity": "medium",
"description": "Gender bias detected: stereotyping based on gender roles",
"bias_type": "gender",
"confidence": 0.78
}
]
}Best practices
- Deploy in
warningmode first to understand your baseline bias detection rate. - Use a lower threshold (e.g.,
0.3) for HR, recruiting, or public-facing content where bias is high-risk. - Combine with Toxicity Detection for comprehensive harmful content coverage.
- Review flagged outputs to identify systematic bias patterns in your LLM's responses.
- Consider different thresholds per agent — customer support bots may need stricter controls than internal tools.