# Securing Model Calls

Every LLM request routed through the Agent Gateway passes through a layered set of security controls — before the request reaches the model provider, during provider interaction, and before the response is returned to the caller. This page documents each control layer.

***

## Provider credential isolation

Applications never handle model provider API keys. When you register a provider in Highflame Studio, the API key is stored in the **Secrets Vault** (backed by AWS Secrets Manager with KMS encryption at rest). Applications reference a named route; the Gateway resolves the route to a provider and retrieves the credential from the vault at request time.

This means:

* A compromised agent or application cannot leak the underlying model API key — it never has access to it
* Rotating a provider key requires updating it in one place (the vault), not across every application
* If the secrets detector fires on an inbound request, it catches an attempt to exfiltrate a key that the caller should not have in the first place

***

## Caller authentication

Every request to the Gateway must present a valid credential before any model call is initiated. Authentication is resolved in the following priority order:

| Priority | Method               | Format                                        |
| -------- | -------------------- | --------------------------------------------- |
| 1        | Highflame API key    | `X-Highflame-APIKey: hf_sk_...`               |
| 2        | Service key (bearer) | `Authorization: Bearer hf_sk_...`             |
| 3        | ZeroID JWT (OAuth)   | `Authorization: Bearer eyJ...` (RS256-signed) |

Unauthenticated requests receive a `401` before any downstream processing occurs. JWT tokens are validated against the Highflame JWKS endpoint; signing keys are refreshed every 5 minutes with lazy refresh on unknown key IDs. Token claims carry the caller's `account_id`, `project_id`, and agent identity, which are propagated into guardrail context and Observatory traces.

Agents using ZeroID issue short-lived JWTs through the [highflame-authn](/agent-identity-zeroid/introduction.md) service. These tokens expire quickly, limiting the blast radius of a stolen credential and enabling continuous access evaluation (CAE) for real-time revocation.

***

## Model pinning

Callers specify a **route name**, not a model. The route configuration — stored in Highflame Studio and controlled by administrators — determines which provider and model handle the request. Applications cannot override this at call time.

This prevents:

* **Model substitution attacks** — an agent cannot be manipulated into routing a sensitive request to a different, less-safe model
* **Provider drift** — model selection is an admin decision, not a developer decision, and changes are audited
* **Prompt-based model switching** — even if an injected prompt attempts to redirect to a different model endpoint, the Gateway ignores it and uses the route's configured model

To change which model a route uses, an admin updates the route configuration in Studio. No application changes are required, and the change is reflected on the next request.

***

## Input evaluation

Before any request is forwarded to the model provider, it passes through the Shield detection pipeline. Detection runs in three tiers ordered by latency:

### Tier 1 — Fast (< 5 ms, deterministic)

Runs on every request with no latency budget impact:

| Detector              | What it checks                                                                                   |
| --------------------- | ------------------------------------------------------------------------------------------------ |
| **Secrets**           | API keys, bearer tokens, SSH keys, PEM certificates, cloud credentials (AWS, GCP, Azure, GitHub) |
| **PII**               | Email addresses, phone numbers, SSNs, credit card numbers, IP addresses                          |
| **Security patterns** | SQL injection, XSS, path traversal, command injection, invisible Unicode characters              |
| **Keyword matching**  | Aho-Corasick matching against configured blocklists                                              |
| **Tool risk scoring** | Scores tool call arguments (0–100) and flags sensitive tool categories                           |
| **Action pattern**    | State-machine detection of multi-step attack sequences (e.g., read file → HTTP post)             |
| **Loop detection**    | Detects repeated identical tool calls above the configured threshold (default: 5)                |
| **Token budget**      | Checks session token consumption against per-session limits                                      |

### Tier 2 — Standard (10–200 ms, ML/NLP)

Runs on requests where Tier 1 does not early-exit:

| Detector         | What it checks                                                                               |
| ---------------- | -------------------------------------------------------------------------------------------- |
| **Injection**    | ML model for prompt injection and jailbreak attempts (produces `injection_confidence` 0–100) |
| **Deep context** | Multi-turn injection detection using conversation history across the session                 |
| **Toxicity**     | Violence, hate speech, sexual content, criminal activity, weapons, profanity                 |
| **Language**     | Language identification across 75 languages                                                  |
| **Sentiment**    | Sentiment analysis for downstream policy use                                                 |

### Tier 3 — Slow (50–500 ms, cloud APIs)

Runs when elevated risk signals warrant deeper inspection:

| Detector           | What it checks                                                   |
| ------------------ | ---------------------------------------------------------------- |
| **Enterprise DLP** | Google Cloud DLP for regulated PII detection with fuzzy matching |
| **Content Safety** | Google Model Armor content safety API                            |
| **Phishing**       | CheckPhish URL scanning for malicious links in prompts           |

**Early exit:** If Tier 1 produces a `deny` decision and the Cedar policy does not require further evidence, Tier 2 and 3 are skipped. This keeps the median latency impact under 5 ms for requests that are cleanly safe or clearly blocked.

### Cedar policy enforcement

Detector outputs are projected into Cedar context keys (`injection_confidence`, `pii_detected`, `contains_secrets`, `tool_risk_score`, etc.) and evaluated against your Cedar policies. The policy decision determines the enforcement action:

| Action      | Behavior                                                                       |
| ----------- | ------------------------------------------------------------------------------ |
| **Block**   | Request is rejected with `403`; violation recorded in Observatory              |
| **Redact**  | Detected content is masked before the request is forwarded to the model        |
| **Alert**   | Request is forwarded; a detection event is emitted for review                  |
| **Monitor** | Request is forwarded; detection is recorded silently with no user notification |

***

## Session-aware context

The detection pipeline is session-aware. Detection signals and action history are persisted in Redis across turns, so policies can respond to the cumulative state of a conversation:

```cedar
// Block network tool calls if PII was seen in any prior turn
forbid(principal, action == Guardrails::Action::"call_tool", resource)
when {
    context has session_pii_detected && context.session_pii_detected == true &&
    context has tool_name && context.tool_name == "http_post"
};
```

Session state includes:

* `session_pii_detected` — whether PII has appeared in any turn
* `session_secrets_detected` — whether secrets have appeared in any turn
* `session_injection_detected` — whether an injection signal has fired in any turn
* `session_cumulative_risk_score` — running total of risk contributions across all turns
* `session_threat_turns` — number of turns with at least one threat signal
* `session_max_injection_score` — peak injection score seen in any turn

This enables circuit-breaker policies that lock down sensitive operations after a threshold of risk accumulates — see [Abuse Detection and Control](/agent-authorization-and-control-shield/policy-templates/abuse-detection.md) for patterns.

***

## Output evaluation

Model responses are evaluated by the same Shield pipeline before being returned to the caller. Output-phase detection checks for:

| Check                   | What it catches                                            |
| ----------------------- | ---------------------------------------------------------- |
| **PII in response**     | Personal information generated or repeated by the model    |
| **Secrets in response** | Credentials the model may have retrieved or hallucinated   |
| **Hallucination**       | Factual inconsistency scored by the hallucination detector |
| **Phishing links**      | Malicious URLs in model-generated content                  |
| **Toxicity**            | Harmful content in the model's output                      |

If the output evaluation produces a `deny`, the response is blocked and the caller receives a `403` — the model's response is never returned. If `redact` applies, sensitive content is masked before the response leaves the Gateway.

***

## Rate limiting and token budgets

**Provider rate limits:** The Gateway automatically retries on `429` (rate limited) and `5xx` responses from model providers using exponential backoff, up to 3 attempts. This is transparent to callers.

**Per-session token budgets:** Token consumption is tracked across turns in the session. When a session exceeds its configured budget, the `budget_exceeded` flag fires and Cedar policies can block further requests. This prevents runaway sessions from consuming unbounded model API spend.

**Cost tracking:** Every model call is recorded with token counts and cost attribution in Observatory. The [Command Center cost intelligence view](/observatory/command-center.md#cost-intelligence) surfaces per-agent and per-model cost breakdowns in real time.

***

## Audit trail

Every model call generates a trace in Observatory containing:

* Caller identity (agent ID, user, application)
* Route and provider/model used
* Request timestamp and end-to-end latency
* Shield evaluation results: detector scores, Cedar policy decisions, determining policies
* Token usage (input and output tokens)
* Cost attribution
* Full payload snapshot at each evaluation point (subject to your data handling policy)

Traces are retained for 30 days. Detection events are retained for 90 days. See [Traces](/observatory/traces.md) and [Threats](/observatory/threats.md) in Observatory for investigation surfaces.

***

## Related

* [Multi-Protocol Gateway](/agent-gateway/ai-gateway.md) — provider configuration, routes, and MCP overview
* [Integrated Guardrails](/agent-gateway/agent-gateway.md) — Cedar policy configuration for gateway traffic
* [Credential Modes](/agent-gateway/credential-modes.md) — per-user credential handling for MCP servers
* [Policy Templates](/agent-authorization-and-control-shield/policy-templates.md) — pre-built Cedar profiles for common deployment types
* [Abuse Detection and Control](/agent-authorization-and-control-shield/policy-templates/abuse-detection.md) — loop detection, budget controls, session circuit breakers


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-gateway/securing-model-calls.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
