Guardrails

Highflame Guardrails help you detect threats and protect your systems across every interaction in agent traffic from user prompts to tool calls to model outputs.

Guardrails operate as context-aware semantic checkpoints that continuously inspect requests and responses as they flow through your system. Unlike traditional filters that evaluate a single message in isolation, Highflame Guardrails maintain awareness of the full conversational and execution context, enabling accurate, real-time enforcement of security, safety, and policy controls.

Guardrails can be:

  • Invoked via a high-performance REST API for explicit, per-request enforcement (or)

  • Embedded directly in the Highflame Agent Gateway for inline protection across all agent traffic

How Guardrails Work

Each request, response, or tool invocation may be inspected in real time, with Guardrails maintaining awareness of the broader conversational and execution context. This includes previous messages, system and developer instructions, agent goals, and request metadata.

  1. Observe inputs, outputs, and intermediate tool calls

  2. Evaluate them against configured security, policy, and safety rules

  3. Take real-time actions (e.g., block, redact, modify, or allow)

  4. Emit structured alerts and telemetry when violations occur

When a Guardrail evaluates content, it produces a concrete decision based on configured policies. Depending on the severity and type of violation, the Guardrail may allow the content to pass unchanged, modify it by masking or redacting sensitive information, or block it entirely.

At the same time, Highflame emits structured events and alerts, giving teams visibility into what happened and why a decision was made. This design makes Guardrails predictable, auditable, and suitable for production systems that require both security and operational clarity.

Core Guardrail Capabilities

Each capability represents a class of protection. Individual guardrails within each capability perform targeted, real-time detection and enforcement.

AI Threat Protection

AI Threat Protection Guardrails are built to defend against attempts to manipulate model behavior or derail agent execution. Unlike traditional security threats, these attacks often unfold gradually, across multiple turns of a conversation, and are designed to exploit the model’s ability to follow instructions rather than a single malformed input.

Highflame’s AI Threat Protection continuously analyzes how user intent evolves over time and compares it against the agent's expected behavior. If a user attempts to override system instructions, introduce hidden directives via indirect prompt injection, or slowly steer an agent toward an unintended goal, Guardrails can detect these trajectory changes. This allows Highflame to stop attacks such as multi-turn jailbreaks or intent laundering before the agent performs unsafe actions or reveals restricted information. Guardrails include,

  • Direct and indirect prompt injection detection

  • Multi-turn jailbreak detection

  • Intent and agent trajectory drift detection

Sensitive Data Protection

Sensitive Data Protection Guardrails ensure that confidential or regulated information does not leave your system unintentionally. These Guardrails inspect both incoming requests and outgoing responses, allowing you to protect data regardless of whether it originates from a user, an internal system, or a model itself.

Sensitive Infotype Detection

Highflame detects common sensitive data types such as personally identifiable information, protected health information, and security credentials, including API keys, tokens, and secrets. Once detected, Guardrails applies the actions you’ve configured, such as masking, redacting, replacing, or anonymizing the data before it is sent to a model, written to logs, or returned to a client. This approach allows teams to use powerful AI systems while maintaining compliance with privacy, security, and regulatory requirements.

Regex Filtering

Regex Filtering provides deterministic, high-performance pattern matching for scenarios where sensitive data or restricted content follows well-defined formats. While semantic detection excels at understanding meaning and intent, regex-based guardrails are ideal for catching known patterns with precision and minimal overhead.

In practice, Regex Filtering is often used to block or sanitize internal identifiers, secret formats, or compliance-sensitive strings. Because these Guardrails operate in-line and are evaluated quickly, they are suitable for high-throughput production environments where predictable behavior and low latency are critical.

Keyword Matching

Keyword Matching Guardrails allow teams to enforce policies based on exact or approximate matches against curated keyword libraries. This capability is particularly useful for brand protection, topic restrictions, or enforcing agent-specific vocabularies.

Rather than relying on a single hardcoded list, Highflame allows keyword libraries to be tailored to specific agents, environments, or use cases. Fuzzy matching enables Guardrails to catch variations and misspellings, while still allowing precise control over enforcement actions. This makes Keyword Matching a flexible tool for shaping how agents communicate and what topics they are allowed to engage with.

Content Safety

Content Safety Guardrails help ensure that both user input and model output align with your organization’s trust and safety standards. These Guardrails are designed to catch harmful, abusive, or otherwise inappropriate content before it reaches end users or downstream systems.

Highflame evaluates content semantically, allowing it to identify violations even when they are phrased indirectly or obfuscated. Depending on your policies, Guardrails can block unsafe content entirely, sanitize it, or allow it through while generating alerts for further review. This enables teams to balance safety requirements with usability, rather than relying on overly aggressive filtering.

Input Validation

Input Validation Guardrails protect agent systems from malicious or anomalous inputs that could compromise safety or integrity. This is especially important in applications where agents invoke tools, query internal systems, or operate with elevated permissions.

Highflame inspects inputs for signs of phishing links, invisible instructions, or patterns that suggest misuse. By validating inputs, Guardrails help ensure that agents operate within their intended boundaries and do not perform actions they were never designed to handle.

Output Validation

Output Validation Guardrails focus on the final step of the interaction: the system's output. Even well-behaved agents can produce unsafe or non-compliant outputs, especially when dealing with complex instructions or sensitive data.

Before responses are returned to users or passed to downstream systems, Highflame scans them for policy violations, sensitive information, unauthorized code generation, and other risks. If an issue is detected, Guardrails can redact or block the output and generate alerts. This final checkpoint helps prevent data leakage and ensures that responses remain aligned with organizational expectations.

Custom Policy Enforcement

Custom Policy Enforcement allows organizations to define Guardrails that reflect their unique security posture, regulatory environment, and operational constraints. Rather than forcing all agents to follow a single global rule set, Highflame enables policies to be scoped to specific agents, teams, or environments.

These policies can combine multiple signals, such as intent, content, metadata, and execution context, to create nuanced enforcement logic. This enables stricter controls on external-facing agents, limits access to sensitive tools, and enforces regional compliance requirements without duplicating logic across systems.

Real-time Threat Detection

Highflame Guardrails are built for real-time, low-latency semantic detection and enforcement across AI and agent traffic. Each guardrail evaluates requests, responses, and tool interactions as they flow through the system, reasoning about both content and context to identify security risks as they arise.

Stacked & Composable Guardrails

Highflame Guardrails are designed to be stacked and composed into flexible enforcement pipelines that adapt to your application's needs. Rather than forcing all security checks to run in a single, monolithic step, Guardrails can be combined into ordered sets, with each guardrail performing a specific function within the overall control flow. This allows teams to build layered protections that are easier to reason about, tune, and evolve over time.

Each guardrail can be executed either synchronously or asynchronously, depending on its purpose and performance profile. Latency-sensitive guardrails—such as prompt injection detection, input validation, or policy enforcement that must occur before a request reaches a model or tool—can be evaluated inline and enforced immediately. These guardrails operate within tight latency budgets and are optimized for real-time decision-making on production traffic.

More computationally expensive or exploratory guardrails—such as deeper semantic analysis, anomaly detection, or post-response evaluation—can be executed asynchronously. Running these guardrails out of band allows Highflame to perform richer analysis without blocking critical execution paths or degrading user-facing performance. Even when executed asynchronously, these guardrails still contribute to system safety by generating alerts, triggering follow-up actions, or informing future policy decisions.

This execution model enables a layered, context-aware approach to protection. Guardrails can share context, build on prior decisions, and collectively enforce policies with increasing precision, all while maintaining predictable performance characteristics. By separating enforcement-critical checks from observational or analytical ones, Highflame allows teams to apply robust security controls without sacrificing responsiveness or scalability in production environments.

Last updated