Guardrails

Highflame Guardrails are powered by Shield, the runtime service that evaluates prompts, model outputs, tool calls, and session context before an agent action is allowed to proceed. You can call Shield directly through the Shield API, or use it inline through products such as MCP Gateway and Code Agent Security.

Guardrails are designed for agentic systems rather than single-message moderation. Shield tracks tool usage, session history, cumulative risk, and action sequences, so decisions can incorporate what happened earlier in the session instead of only inspecting the current payload.

What Shield Evaluates

Shield is built to catch both content risks and execution risks. The current runtime capabilities include:

prompt injection and jailbreak attempts
sensitive data leakage, including secrets and PII
dangerous tool usage such as destructive shell or file operations
infinite loops and repeated tool-call patterns
token budget overruns and agent cost controls
suspicious action sequences such as read-to-exfiltration chains
cross-turn data exposure, where prior sensitive context blocks later writes
MCP-specific risks such as tool poisoning and token misuse

These detections are surfaced as structured signals, and Cedar policies decide whether the final runtime action should be allowed, denied, or logged.

How Guard Evaluation Works

Shield follows a consistent evaluation pipeline:

Detectors inspect the request and produce a raw security context.
A projection layer normalizes that context into stable semantic keys.
Cedar policies evaluate the projected context.
Shield returns a runtime decision together with signals, policy metadata, and optional debugging context.

This split matters because it lets you evolve detection logic without rewriting all of your policies. Policies target stable semantic context, such as injection confidence, secret presence, or tool risk, rather than service-internal implementation details.

Runtime Modes

Shield supports a few different ways to run guardrails depending on where you are in the rollout cycle:

enforce blocks actions when policies resolve to deny
log_only returns the decision but does not block, which is useful for shadow evaluation
detect_only skips Cedar enforcement and returns detector context for debugging and tuning

For deeper inspection, Shield also supports richer response tiers:

the default response is optimized for application integration
explain=true adds projected context and root-cause information
debug=true adds per-detector execution details for profiling and troubleshooting

Policy-Driven Enforcement

Guardrails are not just a bundle of detectors. Runtime behavior is controlled by Cedar policies managed through Highflame's shared policy system. That lets you define organization-specific rules such as:

blocking high-confidence injection attempts in production
preventing tools with destructive side effects from running outside approved environments
denying external writes when earlier turns contained secrets or PII
requiring stricter thresholds for public-facing agents than internal workflows

Because the same policy framework is used across products, teams can keep runtime controls auditable and consistent instead of hardcoding enforcement into each application.

Managing Guardrails In Studio

Studio is the operational UI for guardrails management. Teams typically use it to:

configure guardrail profiles and baseline protections
manage project-level and agent-level policies
inspect detector metadata and policy outcomes
generate service keys for SDK and API usage
validate behavior in the playground before shipping policy changes

The goal is to make Shield usable as a production control surface, not just a backend API. Developers can integrate Shield directly, while security and platform teams can tune enforcement centrally without having to redeploy every client.

PreviousAgent Control Plane NextMCP Registry

Last updated 6 days ago

Good afternoon

hashtagWhat Shield Evaluates

hashtagHow Guard Evaluation Works

hashtagRuntime Modes

hashtagPolicy-Driven Enforcement

hashtagManaging Guardrails In Studio

What Shield Evaluates

How Guard Evaluation Works

Runtime Modes

Policy-Driven Enforcement

Managing Guardrails In Studio