# Threat Alerts

Whenever Shield evaluates a request and a policy fires, it generates a structured threat alert alongside any enforcement action (block, redact, or allow). Threat alerts are the primary record of what Shield detected, which policy drove the decision, and what context was present at the time.

Alerts are collected in [Observatory → Threats](/observatory/threats.md), where they can be filtered, investigated, and exported. This page covers what Shield puts into each alert and how guardrail health failures are surfaced separately.

***

## Alert structure

Every Shield alert includes:

* **Determining policies** — the specific Cedar policy IDs that drove the decision, with their `@id`, `@severity`, and `@tags` annotations
* **Projected context** — the detector outputs Cedar evaluated (injection score, PII flags, secret types, tool risk score, etc.)
* **Enforcement action** — Block, Redact, Alert, or Monitor
* **Policy reason** — the human-readable explanation from the Cedar policy's `@reject_message` annotation
* **Evaluation point** — which phase fired (user prompt, tool call, tool response, assistant response)
* **Session context** — session ID, turn number, and any session-level signals that were active

***

## Alert categories

Shield categorizes alerts by detected risk type. Each alert carries a category used for filtering and trending in Observatory.

| Category                             | What triggers it                                               |
| ------------------------------------ | -------------------------------------------------------------- |
| **Prompt Injection**                 | ML injection confidence above threshold                        |
| **Jailbreak Attempts**               | Jailbreak classifier above threshold                           |
| **Indirect Injection**               | Injection in tool outputs or retrieved content                 |
| **Sensitive Data**                   | PII detected; action was Reject, Masked, Replaced, or Redacted |
| **Secrets Leakage**                  | API keys, tokens, credentials, private keys                    |
| **Restricted Keywords**              | Keyword blocklist match                                        |
| **Command Injection**                | Command injection pattern in tool arguments                    |
| **SQL Injection**                    | SQL injection payload in inputs or outputs                     |
| **Path Traversal**                   | Directory traversal attempt                                    |
| **Encoded Injection**                | Base64 or invisible-character obfuscation in inputs            |
| **Phishing URLs**                    | Malicious URL detected in prompt or model output               |
| **Sexual Content**                   | Toxicity classifier — sexual content                           |
| **Violence**                         | Toxicity classifier — violence                                 |
| **Hate Speech**                      | Toxicity classifier — hate speech                              |
| **Profanity**                        | Toxicity classifier — profanity                                |
| **Weapons / Crime**                  | Toxicity classifier — weapons or criminal activity             |
| **Non-English Language**             | Language detection for restricted-language policies            |
| **Non-ASCII / Invisible Characters** | Non-ASCII or invisible Unicode characters                      |
| **High Entropy**                     | High-entropy string patterns (potential encoded content)       |
| **Markdown / Code**                  | Markdown or code block patterns triggering content rules       |
| **Custom Guardrails**                | Policy-defined custom rules not covered by a built-in category |

***

## Guardrail failure alerts

A separate category — **Requests With Guardrail Failure** — tracks operational issues that prevented a policy from evaluating correctly. These are not security events; they are health signals for your guardrail configuration.

A failure can occur when:

* A policy references a detector signal that could not be computed (missing dependency, configuration error)
* A custom webhook detector timed out or returned an error
* An internal processing error prevented evaluation from completing

Guardrail failures are surfaced in **Observatory → Threats** with a distinct badge. Each failure record includes the failing policy ID, the error code, and the evaluation point where it occurred.

Failures represent potential blind spots. A request that fails evaluation is not blocked — the default behavior on evaluation failure is to allow the request through. Monitor the failure rate for any policy in production and treat a spike as a configuration incident.

***

## Proactive alerting

Shield alerts can be forwarded in real time to your existing incident response and monitoring tooling. See [Integrations → Alerts](/integrations/alerts.md) for configuration:

* **Slack** — route specific alert categories or severities to channels
* **Splunk HEC** — stream events to your SIEM
* **Webhooks** — deliver structured JSON payloads to any endpoint

***

## Related

* [Observatory → Threats](/observatory/threats.md) — faceted search, investigation workflow, and event export
* [Observability](/agent-authorization-and-control-shield/observability.md) — how Shield decisions appear in traces and sessions
* [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) — tuning policy thresholds and `@reject_message` annotations
* [Integrations → Alerts](/integrations/alerts.md) — configuring Slack, Splunk, and webhook destinations


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-authorization-and-control-shield/threat-alerts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.