Auto-Enforce Mode
Protect every LLM call automatically without changing your agent code.
Auto-Enforce Mode
Auto-enforce intercepts LLM SDK calls and evaluates them against your policies before they reach the model. If a blocking violation is detected, the call is rejected immediately.
How it works
- You call
nyraxis_sdk.init(enforce=True) - The SDK patches supported LLM client libraries
- Every
chat.completions.create()(or equivalent) is intercepted - Input is sent to Nyraxis for evaluation
- If
allowed: true→ call proceeds normally - If
allowed: false→NyraxisBlockedErroris raised
Setup
import nyraxis_sdk
nyraxis_sdk.init(
api_key="nyx_...",
enforce=True,
enforce_fail_open=True, # pass through if Nyraxis is down
enforce_timeout_s=5.0, # max wait for evaluation
enforce_cache_ttl_s=60.0, # cache identical inputs
)Handling blocked requests
import nyraxis_sdk
import openai
nyraxis_sdk.init(api_key="nyx_...", enforce=True)
client = openai.OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_input}],
)
except nyraxis_sdk.NyraxisBlockedError as e:
# Log the violation
print(f"Blocked: {e.violations}")
# Return a safe fallback to the user
return "I can't help with that request."Fail-open vs fail-closed
| Mode | When Nyraxis is unreachable |
|---|---|
enforce_fail_open=True (default) | LLM call proceeds — no disruption |
enforce_fail_open=False | LLM call is blocked — for regulated environments |
Recommendation: Use fail_open=True in production unless compliance requires otherwise. Monitor Nyraxis uptime via the dashboard health endpoint.
Caching
Identical inputs within the cache TTL are not re-evaluated. This reduces latency for repeated queries (e.g., retries, form resubmissions).
Set enforce_cache_ttl_s=0 to disable caching entirely.
Supported SDKs
| Library | Supported methods |
|---|---|
openai v1+ | chat.completions.create, completions.create |
anthropic | messages.create |
langchain-openai | ChatOpenAI.invoke |
Performance impact
- P50 latency: +40-80ms per call
- P99 latency: +150ms per call
- Cache hit: +0ms (served from local cache)
The SDK evaluates asynchronously where possible. Output evaluation happens after the response is received.