Threat Alerts
Whenever Shield evaluates a request and a policy fires, it generates a structured threat alert alongside any enforcement action (block, redact, or allow). Threat alerts are the primary record of what Shield detected, which policy drove the decision, and what context was present at the time.
Alerts are collected in Observatory → Threats, where they can be filtered, investigated, and exported. This page covers what Shield puts into each alert and how guardrail health failures are surfaced separately.
Alert structure
Every Shield alert includes:
Determining policies — the specific Cedar policy IDs that drove the decision, with their
@id,@severity, and@tagsannotationsProjected context — the detector outputs Cedar evaluated (injection score, PII flags, secret types, tool risk score, etc.)
Enforcement action — Block, Redact, Alert, or Monitor
Policy reason — the human-readable explanation from the Cedar policy's
@reject_messageannotationEvaluation point — which phase fired (user prompt, tool call, tool response, assistant response)
Session context — session ID, turn number, and any session-level signals that were active
Alert categories
Shield categorizes alerts by detected risk type. Each alert carries a category used for filtering and trending in Observatory.
Prompt Injection
ML injection confidence above threshold
Jailbreak Attempts
Jailbreak classifier above threshold
Indirect Injection
Injection in tool outputs or retrieved content
Sensitive Data
PII detected; action was Reject, Masked, Replaced, or Redacted
Secrets Leakage
API keys, tokens, credentials, private keys
Restricted Keywords
Keyword blocklist match
Command Injection
Command injection pattern in tool arguments
SQL Injection
SQL injection payload in inputs or outputs
Path Traversal
Directory traversal attempt
Encoded Injection
Base64 or invisible-character obfuscation in inputs
Phishing URLs
Malicious URL detected in prompt or model output
Sexual Content
Toxicity classifier — sexual content
Violence
Toxicity classifier — violence
Hate Speech
Toxicity classifier — hate speech
Profanity
Toxicity classifier — profanity
Weapons / Crime
Toxicity classifier — weapons or criminal activity
Non-English Language
Language detection for restricted-language policies
Non-ASCII / Invisible Characters
Non-ASCII or invisible Unicode characters
High Entropy
High-entropy string patterns (potential encoded content)
Markdown / Code
Markdown or code block patterns triggering content rules
Custom Guardrails
Policy-defined custom rules not covered by a built-in category
Guardrail failure alerts
A separate category — Requests With Guardrail Failure — tracks operational issues that prevented a policy from evaluating correctly. These are not security events; they are health signals for your guardrail configuration.
A failure can occur when:
A policy references a detector signal that could not be computed (missing dependency, configuration error)
A custom webhook detector timed out or returned an error
An internal processing error prevented evaluation from completing
Guardrail failures are surfaced in Observatory → Threats with a distinct badge. Each failure record includes the failing policy ID, the error code, and the evaluation point where it occurred.
Failures represent potential blind spots. A request that fails evaluation is not blocked — the default behavior on evaluation failure is to allow the request through. Monitor the failure rate for any policy in production and treat a spike as a configuration incident.
Proactive alerting
Shield alerts can be forwarded in real time to your existing incident response and monitoring tooling. See Integrations → Alerts for configuration:
Slack — route specific alert categories or severities to channels
Splunk HEC — stream events to your SIEM
Webhooks — deliver structured JSON payloads to any endpoint
Related
Observatory → Threats — faceted search, investigation workflow, and event export
Observability — how Shield decisions appear in traces and sessions
Cedar Cookbook — tuning policy thresholds and
@reject_messageannotationsIntegrations → Alerts — configuring Slack, Splunk, and webhook destinations
Last updated