Guardrails
Highflame Guardrails are powered by Shield, the runtime service that evaluates prompts, model outputs, tool calls, and session context before an agent action is allowed to proceed. You can call Shield directly through the Shield API, or use it inline through products such as MCP Gateway and Code Agent Security.
Guardrails are designed for agentic systems rather than single-message moderation. Shield tracks tool usage, session history, cumulative risk, and action sequences, so decisions can incorporate what happened earlier in the session instead of only inspecting the current payload.
What Shield Evaluates
Shield is built to catch both content risks and execution risks. The current runtime capabilities include:
prompt injection and jailbreak attempts
sensitive data leakage, including secrets and PII
dangerous tool usage such as destructive shell or file operations
infinite loops and repeated tool-call patterns
token budget overruns and agent cost controls
suspicious action sequences such as read-to-exfiltration chains
cross-turn data exposure, where prior sensitive context blocks later writes
MCP-specific risks such as tool poisoning and token misuse
These detections are surfaced as structured signals, and Cedar policies decide whether the final runtime action should be allowed, denied, or logged.
How Guard Evaluation Works
Shield follows a consistent evaluation pipeline:
Detectors inspect the request and produce a raw security context.
A projection layer normalizes that context into stable semantic keys.
Cedar policies evaluate the projected context.
Shield returns a runtime decision together with signals, policy metadata, and optional debugging context.
This split matters because it lets you evolve detection logic without rewriting all of your policies. Policies target stable semantic context, such as injection confidence, secret presence, or tool risk, rather than service-internal implementation details.
Runtime Modes
Shield supports a few different ways to run guardrails depending on where you are in the rollout cycle:
enforceblocks actions when policies resolve todenylog_onlyreturns the decision but does not block, which is useful for shadow evaluationdetect_onlyskips Cedar enforcement and returns detector context for debugging and tuning
For deeper inspection, Shield also supports richer response tiers:
the default response is optimized for application integration
explain=trueadds projected context and root-cause informationdebug=trueadds per-detector execution details for profiling and troubleshooting
Policy-Driven Enforcement
Guardrails are not just a bundle of detectors. Runtime behavior is controlled by Cedar policies managed through Highflame's shared policy system. That lets you define organization-specific rules such as:
blocking high-confidence injection attempts in production
preventing tools with destructive side effects from running outside approved environments
denying external writes when earlier turns contained secrets or PII
requiring stricter thresholds for public-facing agents than internal workflows
Because the same policy framework is used across products, teams can keep runtime controls auditable and consistent instead of hardcoding enforcement into each application.
Managing Guardrails In Studio
Studio is the operational UI for guardrails management. Teams typically use it to:
configure guardrail profiles and baseline protections
manage project-level and agent-level policies
inspect detector metadata and policy outcomes
generate service keys for SDK and API usage
validate behavior in the playground before shipping policy changes
The goal is to make Shield usable as a production control surface, not just a backend API. Developers can integrate Shield directly, while security and platform teams can tune enforcement centrally without having to redeploy every client.
Last updated