Cedar Cookbook

Cedar policy patterns for Highflame Shield — prompt injection blocking, PII controls, tool restrictions, delegation depth enforcement, trust level gates, and ZeroID claim integration.

This cookbook is a practical reference for writing, testing, and rolling out Cedar policies in Highflame Shield. It assumes you understand what Shield detectors produce (see Guardrails) and want to translate that output into enforceable runtime behavior.

Brief Cedar Primer

Cedar is an open-source policy language developed at AWS for expressing fine-grained authorization rules. It was purpose-built for the authorization problem: given a principal, an action, and a resource with associated context, should the request be permitted or denied? Compared to imperative code or JSON rule objects, Cedar policies are declarative and auditable. They can be read by non-engineers, analyzed statically for correctness, and composed without unexpected interactions.

For AI guardrails, Cedar is a particularly good fit because agentic systems involve many discrete, potentially consequential actions—executing a tool, reading a file, writing to external storage—that must be individually authorized. Cedar lets you express those boundaries as first-class policies rather than ad hoc conditionals scattered across application code. Because Shield projects detector outputs into a stable semantic context before Cedar evaluation, your policies stay readable even as detection algorithms are updated underneath them.

The Highflame policy system uses Cedar as its shared policy language across products. The same policy framework that governs MCP Gateway tool calls also governs Code Agent file operations and Shield API evaluations. That uniformity means audit logs, policy rollouts, and enforcement changes apply consistently instead of being managed separately per integration point.


The Evaluation Model

Every Shield evaluation follows a fixed pipeline:

Request arrives


Detectors run
    │  → raw scores, categories, pattern matches

Projection layer
    │  → normalizes output into stable semantic keys

Cedar evaluation
    │  → policies read context keys, emit permit / deny

Decision returned
    │  → action, signals, policy metadata, optional debug info

Detectors are the runtime analyzers: injection classifiers, secret pattern matchers, PII scanners, tool-risk evaluators. They run first and produce raw output specific to their implementation.

The projection layer transforms detector output into a stable set of semantic context keys. This is the contract your policies depend on. When a detector is retrained or replaced, the projection layer ensures the same keys appear with the same semantics, so existing policies do not break.

Cedar policies read the projected context and evaluate permit or deny. A request is allowed if at least one permit policy matches and no deny policy matches. Deny always wins over permit.

The decision carries the permit or deny outcome plus structured signals (which detectors fired, at what confidence), the Cedar policies that matched, and optional explain or debug context.


Available Context Keys

The following context keys are available inside Cedar policy conditions. They are populated by Shield's projection layer — a semantic normalization step that merges raw detector outputs into stable keys your policies can depend on. When a detector is retrained or replaced, the projection layer preserves the same key semantics.

Semantic Threat Scores

Key
Type
Range
Description

injection_score

integer

0–100

Prompt injection confidence. Synthesized from multiple detectors (Raudra classifier + DeepContext GRU).

jailbreak_score

integer

0–100

Jailbreak attempt confidence. Multi-turn aware via stateful GRU detector.

Content Safety Scores

Key
Type
Range
Description

violence_score

integer

0–100

Violence content score

hate_speech_score

integer

0–100

Hate speech score

sexual_score

integer

0–100

Sexual content score

weapons_score

integer

0–100

Weapons-related content score

crime_score

integer

0–100

Crime-related content score

profanity_score

integer

0–100

Profanity score

hallucination_score

integer

0–100

Factual inconsistency in model responses (requires contexts for grounding)

Sensitive Data Detection

Key
Type
Description

contains_secrets

boolean

true if API keys, tokens, passwords, private keys, or other credentials were detected (16+ secret formats)

secret_types

set (string)

Types of secrets found: aws_access_key, github_token, openai_key, stripe_key, pem_cert, ssh_key, etc.

secret_count

integer

Number of secrets detected

pii_detected

boolean

true if personally identifiable information was found

pii_types

set (string)

PII types found: ssn, credit_card, phone, email

keyword_matched

boolean

true if content matched a configured keyword filter

keyword_categories

set (string)

Categories of matched keywords

Tool & Code Security

Key
Type
Description

tool_risk

string

Risk classification: low, medium, or high. Destructive shell ops, mass-delete, and external write tools are typically high.

command_injection_detected

boolean

CLI injection patterns in tool arguments

sql_injection_detected

boolean

SQL injection payloads detected

path_traversal_detected

boolean

Directory traversal attempts (../, ..\\)

cross_origin_detected

boolean

Cross-origin resource escalation attempts

encoded_injection_detected

boolean

Base64 or URL-encoded injection payloads

Agent Security

Key
Type
Description

tool_poisoning_detected

boolean

Malicious instructions in tool descriptions

rug_pull_detected

boolean

Financial scam signatures detected

suspicious_pattern

boolean

Action sequences matching known attack trajectories

loop_detected

boolean

Repeated tool invocations indicating stuck or manipulated execution

multi_turn_detection

boolean

Multi-turn jailbreak pattern detected by stateful GRU

MCP Context

Key
Type
Description

mcp_risk

string

MCP server risk assessment based on capabilities

mcp_server_name

string

Name of the MCP server

mcp_transport

string

Transport protocol: sse, stdio, or http

mcp_verified

boolean

Whether the server passed trust/signature verification

Session History

Key
Type
Description

session_pii_detected

boolean

PII detected in any prior turn of this session

session_secrets_detected

boolean

Secrets detected in any prior turn

session_injection_detected

boolean

Injection detected in any prior turn

conversation_turn

integer

Current turn number in the session

Language & Content

Key
Type
Description

detected_language

string

ISO 639-1 language code (75 languages supported)

is_english

boolean

Whether content is in English

contains_code

boolean

Code snippet detected in content

phishing_detected

boolean

Malicious URLs detected

Use client.detect.run() to inspect the full projected context for a given request without Cedar evaluation. Use explain: true on client.guard.evaluate() to see projected context alongside Cedar decisions.


Common Policy Patterns

Namespace convention: Guardrails policies use the Guardrails:: namespace prefix. When writing policies in Studio, the namespace is applied automatically based on the product context. The examples below show the full namespaced form for clarity.

Block High-Confidence Injection

Deny any prompt-processing action when the injection detector has high confidence. A threshold of 80 is a conservative starting point for production; lower values catch more edge cases but increase false positives.

For shadow evaluation before enforcement, run this policy in monitor mode. The decision will be recorded but requests will not be blocked, letting you observe false-positive rates before switching to enforce.


Deny Requests Containing Secrets

Block any action when a secret is present in the payload. This applies across prompt and tool call content types.

If you want to scope this to outbound actions only (preventing secrets from leaving the system rather than blocking all requests):


Restrict Destructive Tools to Approved Environments

High-risk tools should be blocked outside of explicitly approved environments. Use a permit-only pattern: deny by default for high-risk tools, then permit for approved cases.

Note that Cedar evaluates deny before permit. The forbid above establishes the default; the permit carves out the exception.


Require First-Party Trust for Admin Operations

Some operations should only proceed when the requesting principal has been verified as a first-party actor (for example, a token minted by your own authorization server rather than a delegated or external agent).

The unless clause inverts the condition: this forbid applies to all requests except those where the principal carries first_party trust.


Limit Delegation Depth for Sub-Agents

In multi-agent workflows, sub-agents may delegate further to tools or nested agents. Without a depth limit, a compromised sub-agent can create arbitrarily deep delegation chains. Enforce a maximum.


Log-Only for New Detectors (Monitor Mode)

When deploying a new detector, start with a monitor-mode policy. The policy structure is identical to an enforcement policy, but the application runs in monitor mode so decisions are recorded without blocking requests. This lets you observe detector behavior on real traffic before committing to enforcement.

Set the application mode to monitor in Studio or via the Shield API (mode: "monitor"). When the false-positive rate is acceptable, switch to enforce without changing the policy itself.


Allow Tool Calls Only for Specific Scopes

If agents present OAuth2 scopes as part of their identity, you can gate tool execution on scope membership.

Scope-based policies are most useful when agents authenticate via ZeroID or another OAuth2 provider and present a token with a verified scopes claim.


Testing Policies with Debug Mode

Before deploying any policy change, validate it against real or representative payloads using Shield's testing modes.

detect.run() skips Cedar enforcement entirely and returns the full detector context that policies would receive. Use this to understand what context keys are available and what values your detectors are producing for a given input.

explain: true adds the projected context, root-cause information, and determining policies to a normal (Cedar-enforced) evaluation response. Use this to understand why a specific policy matched or did not match.

debug: true adds per-detector execution details including timing and raw detector output. Use this when troubleshooting why a specific detector is or is not firing. Implies explain: true.

Playground in Studio provides an interactive UI for the same capability. Navigate to Studio → Observatory → Playground to test policies against typed or pasted inputs, select a specific policy set, use the built-in attack library, and observe the Cedar decision and projected context in real time. See Policy Playground for a full walkthrough of the policy testing workflow.


Policy Rollout Strategy

Rolling out a new policy in three stages reduces the risk of blocking legitimate traffic.

Stage 1: Detection only (client.detect.run())

Use the detect endpoint to observe what detectors produce on real traffic without Cedar evaluation. Review the projected context values and verify that the signals your policy depends on are present with expected values. Typical duration: a few hours to a day.

Stage 2: Monitor mode (mode: "monitor")

Write the policy and deploy with mode: "monitor". Cedar evaluates and records decisions, but does not block requests. The response includes actual_decision showing what would have happened. Monitor for false positives and false negatives. Typical duration: one to several days.

Stage 3: Alert mode (mode: "alert")

Switch to mode: "alert". Requests still pass through, but violations trigger the alerting pipeline (resp.alerted = true). Tune response playbooks before enforcement. Typical duration: one to several days.

Stage 4: Enforce (mode: "enforce")

Switch to mode: "enforce". Cedar decisions now block requests when policies resolve to deny. Roll out to a subset of traffic first (canary), then expand.

If enforcement causes unexpected blocks, switch back to monitor immediately and investigate using explain: true on the affected request payloads.


Common Mistakes and Anti-Patterns

Writing deny-only policies without a default permit. Cedar requires an explicit permit for a request to be allowed. If you write only forbid policies, every request will be denied (even if no forbid matches) because there is no permit. Always define a baseline permit for your expected traffic, then add targeted forbids.

Using injection_score as the sole gate for all actions. Injection score is calibrated for prompt content. Applying it to tool calls or model responses introduces false positives because those content types have different linguistic patterns. Scope policies to the appropriate content_type and action.

Setting thresholds too low during initial rollout. A threshold of 40 on injection_score will block a significant fraction of legitimate rephrased instructions. Start at 80 in monitor mode, measure your false-positive rate, and lower only if recall is insufficient for your threat model.

Treating monitor as equivalent to disabled. monitor mode still evaluates Cedar and records decisions. Policies in monitor mode produce audit events that count toward reporting and alert thresholds. Do not use monitor as a way to "disable" a policy silently.

Skipping detection-only validation when adding a new detector. New detectors may produce unexpected context values until calibrated for your traffic. Always use client.detect.run() to verify the projected context before writing policies against it.

Writing overlapping forbids without understanding precedence. Cedar deny wins over permit, but two conflicting forbids do not cancel each other out. Review all active policies together before deploying a new one. Use the Playground to test interactions.

Hard-coding environment names in policies without a registry. If you reference Environment::"production-sandbox" in policies, that name must be maintained consistently across all policy updates and environment provisioning. Define environment identifiers in a managed registry rather than embedding them ad hoc.


What's Next

Last updated