Agent Gateway

The Highflame Agent Gateway is the control plane for AI and agent traffic. It sits between your applications and underlying model providers, acting as an intelligent proxy that routes requests, enforces security and policy controls, and captures the signals needed for observability and governance.

Rather than integrating directly with individual model APIs or MCP Tool calls, you can route all Agent requests directly through the Agent Gateway. This allows Highflame to apply model and tool-call guardrails, collect traces, and enforce consistent behavior across models, LLM providers, MCP servers, and environments, all without requiring changes to application logic as your AI stack evolves.

The AI Gateway is built around three key concepts: Providers, Routes, and Endpoints.

LLM Providers

An LLM Provider defines how the Agent Gateway communicates with an underlying AI service or invokes a LLM Model. It encapsulates the configuration required to authenticate, connect, and send requests to a specific model source, such as a commercial LLM provider, a self-hosted model, or an internal inference service.

When you configure an LLM Provider, you are effectively instructing the Gateway how to communicate with that service. This includes assigning the Provider a name, specifying connection details, and securely storing credentials such as API keys or tokens in Highflame’s Secrets Vault. By centralizing this configuration, LLM Providers make it easy to manage multiple model sources and switch between them without touching application code.

Advanced Model Configuration

Custom Model Routes

A custom model route (optionally) defines how a specific request class should be handled by the Agent Gateway. Routes are named, reusable configurations that determine which LLM providers and models are used, which guardrails and policies are enforced, and how requests and responses are processed.

When an application sends a request to Highflame, it specifies a route name. The Gateway uses that route’s configuration to apply the appropriate security controls, select the correct model or provider, and execute any additional behavior such as rate limiting or caching. This allows different parts of an application—or different applications entirely—to use distinct AI behaviors while maintaining centralized visibility and control.

Each Route combines several concerns into a single, declarative unit. It defines the API path exposed by the Gateway, the providers and models that handle the request, the policies and guardrails that must be enforced, and any route-specific behavior that should apply during execution.

Routes consist of these key details:

  • Path: A unique name that becomes part of the API endpoint

  • Models and Providers: The models and providers handling the request

  • Policies: Granular controls that you apply to a Route

  • Behavior: Set up actions for the Route, like rate limiting or caching

Types of Model Routes

  • Custom Routes: Tailor a route's behavior by configuring a custom payload or connecting to custom or internal LLM endpoints. You can control the exact request format, headers, and response parsing logic. These are useful for integrating proprietary models, APIs that don’t follow standard formats, or specialized ML workflows.

  • Unified Routes: Handle different model types under a single logical route that dynamically adapts how requests are handled based on request metadata, model preferences, or business logic. These are useful for centralizing multiple capabilities (such as chat and completion) into a single route, A/B testing models, or conditional routing to fallback providers.

  • Auto-Provisioning: When you create a provider, Highflame automatically creates a unified route for that provider, labeled as Reserved Routes. You can use them as default access points for your providers, edit or extend them as needed, and speed up development and testing. These are designed to work out of the box, so you don't have to spend time on extra manual setup

MCP Servers & Tool Calling

In addition to routing requests to language models, the Highflame Agent Gateway also provides native support for MCP Servers and Tools, allowing agents to safely interact with external systems, APIs, and internal services.

MCP Servers represent tool backends that expose capabilities an agent can invoke during execution. These may include internal APIs, databases, SaaS services, or custom business logic. By registering MCP Servers with the Agent Gateway, you provide Highflame with a clear understanding of which tools exist, how they are accessed, and the permissions or constraints governing their use.

Tools are the individual operations exposed by an MCP Server. Each tool defines a specific action an agent can perform, along with its expected inputs and outputs. When an agent calls a tool, the request is routed through the Agent Gateway rather than executed directly. This allows Highflame to apply the same guardrails, policies, and observability controls to tool invocations as it does to model requests.

Unified Agent Control Plane Platform

By routing tool calls through the Gateway, Highflame enables fine-grained control over agent behavior. Input validation guardrails can inspect tool parameters before execution, preventing unauthorized access or malformed requests. Output validation guardrails can ensure that tool responses do not expose sensitive data or violate policy. Tool usage is also fully traced, allowing you to see when a tool was invoked, why it was chosen, and how long it took to execute.

Together, MCP Servers and Tools extend the Agent Gateway beyond model orchestration, turning it into a unified control plane for both reasoning and action. This ensures that every step an agent takes, whether generating text or interacting with real systems, is observable, governed, and secure.

MCP Traffic Handling and Enforcement

All MCP traffic is intercepted by the Agent Gateway and routed through dedicated proxy endpoints. This ensures complete visibility and centralized enforcement across all MCP interactions, regardless of which agent or application initiated the request.

Requests are evaluated using a two-phase filtering model. Guardrails and policies are applied both before execution (input phase) and after execution (output phase). This provides comprehensive protection against unsafe inputs and malicious or policy-violating responses.

Before forwarding any request, the Gateway validates metadata against the MCP Registry to confirm that the target server is active and that the requested tool is explicitly allowed. Different MCP methods—such as tools, prompts, or resources—are handled with method-specific logic, ensuring that appropriate security controls are applied for each interaction type.

All MCP requests and responses are recorded with full contextual metadata, creating a complete audit trail suitable for security investigations, compliance, and threat analysis.

MCP Protocol and Content Support

Highflame fully supports the MCP streamable HTTP protocol, allowing agents to interact with MCP Servers using the standard specification without custom adapters.

By integrating MCP Servers and Tools directly into the Agent Gateway, Highflame ensures that every agent action—whether generating text or interacting with real systems—is explicit, observable, and governed by design.

Last updated