# Guardrail Evaluations

Guardrails evaluate content at multiple points in the request lifecycle, in both blocking and non-blocking modes, and with awareness of the full session context. This page explains when evaluation happens, how checks are layered, and how enforcement mode affects behavior.

***

### Evaluation Points

Every request flowing through Highflame can be evaluated at up to four points:

| Phase             | What is evaluated                                    | Typical purpose                          |
| ----------------- | ---------------------------------------------------- | ---------------------------------------- |
| **Input**         | User prompt before reaching the model                | Block injection, validate content        |
| **Tool call**     | Tool name and arguments before execution             | Block dangerous or unauthorized tool use |
| **Tool response** | Output returned from a tool before the model sees it | Block indirect injection, data leakage   |
| **Output**        | Model response before it reaches the user            | Block sensitive data, unsafe content     |

You can enable guardrails at any combination of these points independently.

***

### Layered Detection

Within each evaluation point, checks are layered by speed and depth:

**Fast checks** run first — deterministic, sub-millisecond pattern matching for secrets, PII, known injection patterns, command injection, SQL injection, and other high-confidence signals. These run with minimal latency impact.

**Semantic checks** follow — ML-based analysis for prompt injection confidence, jailbreak likelihood, toxicity scoring, and multi-turn behavioral context. These run when fast checks haven't already produced a definitive decision, or when policy requires deeper analysis.

**Deep checks** run last — cloud-based or computationally intensive analysis such as enterprise DLP or file content safety. These are opt-in and run only when configured.

Each layer has circuit breakers. If a deeper check is unavailable, evaluation continues without it and the request is not blocked solely due to check unavailability.

***

### Session Awareness

Guardrails maintain state across conversation turns. For each session, Highflame tracks:

* Cumulative signals from previous turns (injection scores, detected patterns, tool history)
* Behavioral sequences — whether the agent's tool call patterns match known attack trajectories
* Token consumption across the session
* Repeated tool invocations that may indicate a loop

This means a message that appears benign in isolation can still be caught if it's part of a pattern that has been building across the conversation.

***

### Enforcement Modes

Guardrails operate in one of three modes, set per-request or as a default for the project:

| Mode        | Behavior                                                                                    |
| ----------- | ------------------------------------------------------------------------------------------- |
| **Enforce** | Violations are blocked. The request or response does not proceed.                           |
| **Alert**   | Violations generate alerts but the request proceeds. Useful for monitoring live traffic.    |
| **Monitor** | Decisions are logged with no action taken. Useful for validating policy before enforcement. |

The recommended rollout sequence is Monitor → Alert → Enforce. This lets you observe how policies behave against real traffic before enabling blocking.

***

### Inline vs. Background Evaluation

Guardrails that must produce a decision before the request continues — prompt inspection, tool argument validation — run **inline** and block until a decision is reached. These are optimized for low latency.

Guardrails used for richer analysis, alerting, or post-hoc review can run **in the background** without blocking the request path. Even when run in the background, they contribute to session state and can influence future decisions within the same conversation.

***

### Per-Agent and Per-Environment Scoping

Guardrail policies are scoped to your project and can be further scoped to specific agents, environments, or trust levels. This means:

* External-facing agents can have stricter controls than internal ones
* Development environments can run in monitor mode while production enforces
* High-trust agents (first-party, fully delegated) can be granted wider permissions without changing the underlying detection logic

Scoping is expressed in Cedar policies using the context keys produced by detectors. See the [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) for examples.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-authorization-and-control-shield/guardrails-policies/bounded-functional-units.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
