Nyraxis AI

Auto-Enforce Mode

Protect every LLM call automatically without changing your agent code.

Auto-Enforce Mode

Auto-enforce intercepts LLM SDK calls and evaluates them against your policies before they reach the model. If a blocking violation is detected, the call is rejected immediately.

How it works

  1. You call nyraxis_sdk.init(enforce=True)
  2. The SDK patches supported LLM client libraries
  3. Every chat.completions.create() (or equivalent) is intercepted
  4. Input is sent to Nyraxis for evaluation
  5. If allowed: true → call proceeds normally
  6. If allowed: falseNyraxisBlockedError is raised

Setup

import nyraxis_sdk

nyraxis_sdk.init(
    api_key="nyx_...",
    enforce=True,
    enforce_fail_open=True,       # pass through if Nyraxis is down
    enforce_timeout_s=5.0,        # max wait for evaluation
    enforce_cache_ttl_s=60.0,     # cache identical inputs
)

Handling blocked requests

import nyraxis_sdk
import openai

nyraxis_sdk.init(api_key="nyx_...", enforce=True)

client = openai.OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_input}],
    )
except nyraxis_sdk.NyraxisBlockedError as e:
    # Log the violation
    print(f"Blocked: {e.violations}")
    # Return a safe fallback to the user
    return "I can't help with that request."

Fail-open vs fail-closed

ModeWhen Nyraxis is unreachable
enforce_fail_open=True (default)LLM call proceeds — no disruption
enforce_fail_open=FalseLLM call is blocked — for regulated environments

Recommendation: Use fail_open=True in production unless compliance requires otherwise. Monitor Nyraxis uptime via the dashboard health endpoint.

Caching

Identical inputs within the cache TTL are not re-evaluated. This reduces latency for repeated queries (e.g., retries, form resubmissions).

Set enforce_cache_ttl_s=0 to disable caching entirely.

Supported SDKs

LibrarySupported methods
openai v1+chat.completions.create, completions.create
anthropicmessages.create
langchain-openaiChatOpenAI.invoke

Performance impact

  • P50 latency: +40-80ms per call
  • P99 latency: +150ms per call
  • Cache hit: +0ms (served from local cache)

The SDK evaluates asynchronously where possible. Output evaluation happens after the response is received.

On this page