Auto-Enforce Mode

Auto-enforce intercepts LLM SDK calls and evaluates them against your policies before they reach the model. If a blocking violation is detected, the call is rejected immediately.

How it works

You call nyraxis_sdk.init(enforce=True)
The SDK patches supported LLM client libraries
Every chat.completions.create() (or equivalent) is intercepted
Input is sent to Nyraxis for evaluation
If allowed: true → call proceeds normally
If allowed: false → NyraxisBlockedError is raised

Setup

import nyraxis_sdk

nyraxis_sdk.init(
    api_key="nyx_...",
    enforce=True,
    enforce_fail_open=True,       # pass through if Nyraxis is down
    enforce_timeout_s=5.0,        # max wait for evaluation
    enforce_cache_ttl_s=60.0,     # cache identical inputs
)

Handling blocked requests

import nyraxis_sdk
import openai

nyraxis_sdk.init(api_key="nyx_...", enforce=True)

client = openai.OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_input}],
    )
except nyraxis_sdk.NyraxisBlockedError as e:
    # Log the violation
    print(f"Blocked: {e.violations}")
    # Return a safe fallback to the user
    return "I can't help with that request."

Fail-open vs fail-closed

Mode	When Nyraxis is unreachable
`enforce_fail_open=True` (default)	LLM call proceeds — no disruption
`enforce_fail_open=False`	LLM call is blocked — for regulated environments

Recommendation: Use fail_open=True in production unless compliance requires otherwise. Monitor Nyraxis uptime via the dashboard health endpoint.

Caching

Identical inputs within the cache TTL are not re-evaluated. This reduces latency for repeated queries (e.g., retries, form resubmissions).

Set enforce_cache_ttl_s=0 to disable caching entirely.

Supported SDKs

Library	Supported methods
`openai` v1+	`chat.completions.create`, `completions.create`
`anthropic`	`messages.create`
`langchain-openai`	`ChatOpenAI.invoke`

Performance impact

P50 latency: +40-80ms per call
P99 latency: +150ms per call
Cache hit: +0ms (served from local cache)

The SDK evaluates asynchronously where possible. Output evaluation happens after the response is received.

Auto-Enforce Mode

On this page