# Production Patterns

Recommended patterns for running Highflame Shield and ZeroID in production environments.

### Client Initialization

Create one `Highflame` client per process and share it across requests. The client maintains an internal token cache and connection pool — creating a new instance per request wastes resources and loses token caching.

**Python — module-level singleton:**

```python
# highflame_client.py
import os
from highflame import Highflame

_client: Highflame | None = None

def get_client() -> Highflame:
    global _client
    if _client is None:
        _client = Highflame(
            api_key=os.environ["HIGHFLAME_API_KEY"],
            account_id=os.environ.get("HIGHFLAME_ACCOUNT_ID"),
            project_id=os.environ.get("HIGHFLAME_PROJECT_ID"),
        )
    return _client
```

**TypeScript — module-level singleton:**

```typescript
// highflame.ts
import { Highflame } from "@highflame/sdk";

export const client = new Highflame({
  apiKey: process.env.HIGHFLAME_API_KEY!,
  accountId: process.env.HIGHFLAME_ACCOUNT_ID,
  projectId: process.env.HIGHFLAME_PROJECT_ID,
});
```

### Async Context Managers

Use async context managers when you need guaranteed resource cleanup (connection pool teardown) at process shutdown:

```python
async def main():
    async with Highflame(api_key="hf_sk_...") as client:
        # client is cleaned up on exit, even on exception
        ...
```

### Enforcement Rollout Strategy

Roll out enforcement in stages to avoid blocking legitimate traffic while policies are being tuned:

**Stage 1 — Monitor**: observe decisions without blocking. Review the observatory to understand what your traffic looks like.

```python
# Set mode per request — there is no default_mode client option
resp = client.guard.evaluate(GuardRequest(..., mode="monitor"))
```

**Stage 2 — Alert**: allow traffic but trigger your alerting pipeline on violations. Tune your response playbooks before enforcement.

```python
resp = client.guard.evaluate(GuardRequest(..., mode="alert"))
if resp.alerted:
    notify_security_team(resp)
```

**Stage 3 — Enforce**: block violations. Roll out to a subset of traffic first (canary), then expand.

```python
resp = client.guard.evaluate(GuardRequest(..., mode="enforce"))
if resp.denied:
    raise PermissionError(resp.policy_reason)
```

### Non-Blocking Guardrails in Critical Paths

If your latency budget cannot absorb a Shield call in the hot path, evaluate in the background and use the result asynchronously:

```python
import asyncio
from highflame import Highflame, GuardRequest

async def handle_request(prompt: str, session_id: str):
    # Start guard evaluation concurrently with other work
    guard_task = asyncio.create_task(
        client.guard.aevaluate(GuardRequest(
            content=prompt,
            content_type="prompt",
            action="process_prompt",
            session_id=session_id,
        ))
    )

    # Do other prep work
    context = await load_context(session_id)

    # Now wait for the guard result before calling the LLM
    resp = await guard_task
    if resp.denied:
        raise PermissionError(resp.policy_reason)

    return await call_llm(prompt, context)
```

### Graceful Degradation

When Shield is unavailable (network partition, service outage), decide whether to fail open or fail closed based on your risk tolerance:

```python
from highflame import APIConnectionError

async def evaluate_with_fallback(request: GuardRequest) -> GuardResponse | None:
    try:
        return await client.guard.aevaluate(request)
    except APIConnectionError:
        # Fail open — allow the request through and log for later review
        logger.warning("Shield unreachable — failing open for request")
        return None  # caller treats None as allowed
```

For high-security applications, fail closed:

```python
    except APIConnectionError:
        # Fail closed — block the request
        raise RuntimeError("Security service unavailable — request blocked")
```

### Environment Variable Reference

Configure the SDK through environment variables to keep credentials out of code:

| Variable               | Description                        |
| ---------------------- | ---------------------------------- |
| `HIGHFLAME_API_KEY`    | Service key (`hf_sk_...`)          |
| `HIGHFLAME_ACCOUNT_ID` | Account ID for multi-tenant setups |
| `HIGHFLAME_PROJECT_ID` | Project ID for multi-tenant setups |

For self-hosted deployments, also configure:

| Variable              | Description                 |
| --------------------- | --------------------------- |
| `HIGHFLAME_BASE_URL`  | Guard service base URL      |
| `HIGHFLAME_TOKEN_URL` | Token exchange endpoint URL |

```bash
export HIGHFLAME_API_KEY="hf_sk_..."
export HIGHFLAME_ACCOUNT_ID="acc_prod"
export HIGHFLAME_PROJECT_ID="proj_main"
```

The Python SDK reads these automatically if you pass `api_key=None`:

```python
import os
from highflame import Highflame

# Reads HIGHFLAME_API_KEY, HIGHFLAME_ACCOUNT_ID, HIGHFLAME_PROJECT_ID from env
client = Highflame(api_key=os.environ["HIGHFLAME_API_KEY"])
```

### Logging

The TypeScript SDK accepts a `logger` object with standard `debug`/`info`/`warn` methods:

```typescript
const client = new Highflame({
  apiKey: "hf_sk_...",
  logger: {
    debug: (msg, ...args) => logger.debug(msg, ...args),
    info:  (msg, ...args) => logger.info(msg, ...args),
    warn:  (msg, ...args) => logger.warn(msg, ...args),
  },
});
```

Pass your existing logger (Winston, Pino, etc.) to route SDK logs through your application's logging pipeline.

### Token Caching (ZeroID)

The Highflame SDK exchanges service keys for short-lived JWTs and caches them automatically. The cache refreshes 60 seconds before expiry (`TOKEN_REFRESH_BUFFER_MS = 60_000`). Concurrent requests share a single in-flight token exchange — there is no thundering-herd problem at startup.

For ZeroID clients, the same behavior applies: `tokens.verify()` fetches the JWKS once and caches the signing keys. The cache is invalidated when the JWKS rotates (detected via `kid` rotation).

### Production Checklist

* [ ] One `Highflame` client per process (not per request)
* [ ] API key stored in secrets management (not in code)
* [ ] `account_id` and `project_id` set explicitly (not relying on key defaults)
* [ ] Session IDs scoped to user + conversation (not global)
* [ ] `max_retries` set to at least `2` (default) for transient error tolerance
* [ ] `timeout` configured to fit your SLA budget
* [ ] Enforcement mode rolled out in stages: monitor → alert → enforce
* [ ] Graceful degradation strategy defined (fail open or fail closed)
* [ ] Policies verified with `client.debug.policies()` at startup
* [ ] Observatory reviewed before go-live


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/guides/production-patterns.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
