Cedar Cookbook
Cedar policy patterns for Highflame Shield — prompt injection blocking, PII controls, tool restrictions, delegation depth enforcement, trust level gates, and ZeroID claim integration.
This cookbook is a practical reference for writing, testing, and rolling out Cedar policies in Highflame Shield. It assumes you understand what Shield detectors produce (see Guardrails) and want to translate that output into enforceable runtime behavior.
Brief Cedar Primer
Cedar is an open-source policy language developed at AWS for expressing fine-grained authorization rules. It was purpose-built for the authorization problem: given a principal, an action, and a resource with associated context, should the request be permitted or denied? Compared to imperative code or JSON rule objects, Cedar policies are declarative and auditable. They can be read by non-engineers, analyzed statically for correctness, and composed without unexpected interactions.
For AI guardrails, Cedar is a particularly good fit because agentic systems involve many discrete, potentially consequential actions—executing a tool, reading a file, writing to external storage—that must be individually authorized. Cedar lets you express those boundaries as first-class policies rather than ad hoc conditionals scattered across application code. Because Shield projects detector outputs into a stable semantic context before Cedar evaluation, your policies stay readable even as detection algorithms are updated underneath them.
The Highflame policy system uses Cedar as its shared policy language across products. The same policy framework that governs MCP Gateway tool calls also governs Code Agent file operations and Shield API evaluations. That uniformity means audit logs, policy rollouts, and enforcement changes apply consistently instead of being managed separately per integration point.
The Evaluation Model
Every Shield evaluation follows a fixed pipeline:
Request arrives
│
▼
Detectors run
│ → raw scores, categories, pattern matches
▼
Projection layer
│ → normalizes output into stable semantic keys
▼
Cedar evaluation
│ → policies read context keys, emit permit / deny
▼
Decision returned
│ → action, signals, policy metadata, optional debug infoDetectors are the runtime analyzers: injection classifiers, secret pattern matchers, PII scanners, tool-risk evaluators. They run first and produce raw output specific to their implementation.
The projection layer transforms detector output into a stable set of semantic context keys. This is the contract your policies depend on. When a detector is retrained or replaced, the projection layer ensures the same keys appear with the same semantics, so existing policies do not break.
Cedar policies read the projected context and evaluate permit or deny. A request is allowed if at least one permit policy matches and no deny policy matches. Deny always wins over permit.
The decision carries the permit or deny outcome plus structured signals (which detectors fired, at what confidence), the Cedar policies that matched, and optional explain or debug context.
Available Context Keys
The following context keys are available inside Cedar policy conditions. They are populated by Shield's projection layer — a semantic normalization step that merges raw detector outputs into stable keys your policies can depend on. When a detector is retrained or replaced, the projection layer preserves the same key semantics.
Semantic Threat Scores
injection_score
integer
0–100
Prompt injection confidence. Synthesized from multiple detectors (Raudra classifier + DeepContext GRU).
jailbreak_score
integer
0–100
Jailbreak attempt confidence. Multi-turn aware via stateful GRU detector.
Content Safety Scores
violence_score
integer
0–100
Violence content score
hate_speech_score
integer
0–100
Hate speech score
sexual_score
integer
0–100
Sexual content score
weapons_score
integer
0–100
Weapons-related content score
crime_score
integer
0–100
Crime-related content score
profanity_score
integer
0–100
Profanity score
hallucination_score
integer
0–100
Factual inconsistency in model responses (requires contexts for grounding)
Sensitive Data Detection
contains_secrets
boolean
true if API keys, tokens, passwords, private keys, or other credentials were detected (16+ secret formats)
secret_types
set (string)
Types of secrets found: aws_access_key, github_token, openai_key, stripe_key, pem_cert, ssh_key, etc.
secret_count
integer
Number of secrets detected
pii_detected
boolean
true if personally identifiable information was found
pii_types
set (string)
PII types found: ssn, credit_card, phone, email
keyword_matched
boolean
true if content matched a configured keyword filter
keyword_categories
set (string)
Categories of matched keywords
Tool & Code Security
tool_risk
string
Risk classification: low, medium, or high. Destructive shell ops, mass-delete, and external write tools are typically high.
command_injection_detected
boolean
CLI injection patterns in tool arguments
sql_injection_detected
boolean
SQL injection payloads detected
path_traversal_detected
boolean
Directory traversal attempts (../, ..\\)
cross_origin_detected
boolean
Cross-origin resource escalation attempts
encoded_injection_detected
boolean
Base64 or URL-encoded injection payloads
Agent Security
tool_poisoning_detected
boolean
Malicious instructions in tool descriptions
rug_pull_detected
boolean
Financial scam signatures detected
suspicious_pattern
boolean
Action sequences matching known attack trajectories
loop_detected
boolean
Repeated tool invocations indicating stuck or manipulated execution
multi_turn_detection
boolean
Multi-turn jailbreak pattern detected by stateful GRU
MCP Context
mcp_risk
string
MCP server risk assessment based on capabilities
mcp_server_name
string
Name of the MCP server
mcp_transport
string
Transport protocol: sse, stdio, or http
mcp_verified
boolean
Whether the server passed trust/signature verification
Session History
session_pii_detected
boolean
PII detected in any prior turn of this session
session_secrets_detected
boolean
Secrets detected in any prior turn
session_injection_detected
boolean
Injection detected in any prior turn
conversation_turn
integer
Current turn number in the session
Language & Content
detected_language
string
ISO 639-1 language code (75 languages supported)
is_english
boolean
Whether content is in English
contains_code
boolean
Code snippet detected in content
phishing_detected
boolean
Malicious URLs detected
Use client.detect.run() to inspect the full projected context for a given request without Cedar evaluation. Use explain: true on client.guard.evaluate() to see projected context alongside Cedar decisions.
Common Policy Patterns
Namespace convention: Guardrails policies use the Guardrails:: namespace prefix. When writing policies in Studio, the namespace is applied automatically based on the product context. The examples below show the full namespaced form for clarity.
Block High-Confidence Injection
Deny any prompt-processing action when the injection detector has high confidence. A threshold of 80 is a conservative starting point for production; lower values catch more edge cases but increase false positives.
For shadow evaluation before enforcement, run this policy in monitor mode. The decision will be recorded but requests will not be blocked, letting you observe false-positive rates before switching to enforce.
Deny Requests Containing Secrets
Block any action when a secret is present in the payload. This applies across prompt and tool call content types.
If you want to scope this to outbound actions only (preventing secrets from leaving the system rather than blocking all requests):
Restrict Destructive Tools to Approved Environments
High-risk tools should be blocked outside of explicitly approved environments. Use a permit-only pattern: deny by default for high-risk tools, then permit for approved cases.
Note that Cedar evaluates deny before permit. The forbid above establishes the default; the permit carves out the exception.
Require First-Party Trust for Admin Operations
Some operations should only proceed when the requesting principal has been verified as a first-party actor (for example, a token minted by your own authorization server rather than a delegated or external agent).
The unless clause inverts the condition: this forbid applies to all requests except those where the principal carries first_party trust.
Limit Delegation Depth for Sub-Agents
In multi-agent workflows, sub-agents may delegate further to tools or nested agents. Without a depth limit, a compromised sub-agent can create arbitrarily deep delegation chains. Enforce a maximum.
Log-Only for New Detectors (Monitor Mode)
When deploying a new detector, start with a monitor-mode policy. The policy structure is identical to an enforcement policy, but the application runs in monitor mode so decisions are recorded without blocking requests. This lets you observe detector behavior on real traffic before committing to enforcement.
Set the application mode to monitor in Studio or via the Shield API (mode: "monitor"). When the false-positive rate is acceptable, switch to enforce without changing the policy itself.
Allow Tool Calls Only for Specific Scopes
If agents present OAuth2 scopes as part of their identity, you can gate tool execution on scope membership.
Scope-based policies are most useful when agents authenticate via ZeroID or another OAuth2 provider and present a token with a verified scopes claim.
Testing Policies with Debug Mode
Before deploying any policy change, validate it against real or representative payloads using Shield's testing modes.
detect.run() skips Cedar enforcement entirely and returns the full detector context that policies would receive. Use this to understand what context keys are available and what values your detectors are producing for a given input.
explain: true adds the projected context, root-cause information, and determining policies to a normal (Cedar-enforced) evaluation response. Use this to understand why a specific policy matched or did not match.
debug: true adds per-detector execution details including timing and raw detector output. Use this when troubleshooting why a specific detector is or is not firing. Implies explain: true.
Playground in Studio provides an interactive UI for the same capability. Navigate to Studio → Observatory → Playground to test policies against typed or pasted inputs, select a specific policy set, use the built-in attack library, and observe the Cedar decision and projected context in real time. See Policy Playground for a full walkthrough of the policy testing workflow.
Policy Rollout Strategy
Rolling out a new policy in three stages reduces the risk of blocking legitimate traffic.
Stage 1: Detection only (client.detect.run())
Use the detect endpoint to observe what detectors produce on real traffic without Cedar evaluation. Review the projected context values and verify that the signals your policy depends on are present with expected values. Typical duration: a few hours to a day.
Stage 2: Monitor mode (mode: "monitor")
Write the policy and deploy with mode: "monitor". Cedar evaluates and records decisions, but does not block requests. The response includes actual_decision showing what would have happened. Monitor for false positives and false negatives. Typical duration: one to several days.
Stage 3: Alert mode (mode: "alert")
Switch to mode: "alert". Requests still pass through, but violations trigger the alerting pipeline (resp.alerted = true). Tune response playbooks before enforcement. Typical duration: one to several days.
Stage 4: Enforce (mode: "enforce")
Switch to mode: "enforce". Cedar decisions now block requests when policies resolve to deny. Roll out to a subset of traffic first (canary), then expand.
If enforcement causes unexpected blocks, switch back to monitor immediately and investigate using explain: true on the affected request payloads.
Common Mistakes and Anti-Patterns
Writing deny-only policies without a default permit. Cedar requires an explicit permit for a request to be allowed. If you write only forbid policies, every request will be denied (even if no forbid matches) because there is no permit. Always define a baseline permit for your expected traffic, then add targeted forbids.
Using injection_score as the sole gate for all actions. Injection score is calibrated for prompt content. Applying it to tool calls or model responses introduces false positives because those content types have different linguistic patterns. Scope policies to the appropriate content_type and action.
Setting thresholds too low during initial rollout. A threshold of 40 on injection_score will block a significant fraction of legitimate rephrased instructions. Start at 80 in monitor mode, measure your false-positive rate, and lower only if recall is insufficient for your threat model.
Treating monitor as equivalent to disabled. monitor mode still evaluates Cedar and records decisions. Policies in monitor mode produce audit events that count toward reporting and alert thresholds. Do not use monitor as a way to "disable" a policy silently.
Skipping detection-only validation when adding a new detector. New detectors may produce unexpected context values until calibrated for your traffic. Always use client.detect.run() to verify the projected context before writing policies against it.
Writing overlapping forbids without understanding precedence. Cedar deny wins over permit, but two conflicting forbids do not cancel each other out. Review all active policies together before deploying a new one. Use the Playground to test interactions.
Hard-coding environment names in policies without a registry. If you reference Environment::"production-sandbox" in policies, that name must be maintained consistently across all policy updates and environment provisioning. Define environment identifiers in a managed registry rather than embedding them ad hoc.
What's Next
Guardrails Overview — how Shield detectors and the projection layer work
Guardrail APIs — calling Shield directly for per-request evaluation
Bounded Functional Units — composing detectors into processor chains
Governance & Reporting — audit archive, reporting, and compliance framework coverage
Threat Alerts — routing policy violations to Slack, Splunk, and other destinations
Last updated