Multi-Protocol Gateway
The Highflame Agent Gateway — unified LLM proxy, MCP routing, guardrails, and observability for all agent traffic.
The Highflame Agent Gateway is the unified security gateway for AI, MCP, and agent-to-agent traffic. It sits between your agents and downstream model providers and tool servers, acting as an intelligent proxy that routes requests, enforces guardrails and Cedar policies, and captures the telemetry needed for observability and governance.
Rather than integrating directly with individual model APIs or MCP servers, you route all agent traffic through the Gateway. This lets Highflame apply consistent guardrails, collect traces, and enforce policy across all models, providers, MCP servers, and environments — without changes to application logic as your stack evolves.
LLM Providers
An LLM Provider defines how the Gateway communicates with an underlying AI service. It encapsulates the configuration needed to authenticate, connect, and send requests to a specific model source — commercial LLM APIs, self-hosted models, or internal inference services.
Credentials (API keys, tokens) are stored securely in Highflame's Secrets Vault. By centralizing this configuration, providers make it easy to manage multiple model sources and switch between them without touching application code.
Supported providers:
OpenAI
openai/gpt-4o
Anthropic
anthropic/claude-sonnet-4-6
Azure OpenAI
azure/gpt-4o
Google Gemini
gemini/gemini-2.0-flash
Groq
groq/llama-3.3-70b-versatile
Mistral
mistral/mistral-large
Ollama (self-hosted)
ollama/llama3.2
Together AI
together/meta-llama/Llama-3-70b
DeepSeek
deepseek/deepseek-chat
Models are specified as provider/model in route configuration. The Gateway handles request transformation, authentication, and structural differences between providers automatically.
Exposed API endpoints:
The Gateway exposes OpenAI-compatible endpoints so existing agent SDKs work without modification:
POST /v1/chat/completions
Chat completions (all providers)
POST /v1/responses
OpenAI Responses API
POST /v1/embeddings
Text embeddings
POST /v1/images/generations
Image generation
POST /v1/audio/speech
Text-to-speech
POST /v1/audio/transcriptions
Speech-to-text
POST /v1/audio/translations
Audio translation
POST /v1/moderations
Content moderation
Retry behavior: The Gateway automatically retries on 429 and 5xx responses with exponential backoff, up to 3 attempts.
Routes
A Route defines how a specific request class is handled. Routes are named, reusable configurations that determine which providers and models are used, which guardrails and Cedar policies are enforced, and how requests and responses are processed.
When an application sends a request to the Gateway, it specifies a route. The Gateway uses that route's configuration to apply the appropriate controls, select the correct model or provider, and execute any additional behavior.
Route configuration fields:
Path — unique name that becomes part of the API endpoint
Models and Providers — which models handle the request; specified as
provider/modelPolicies — Cedar policies scoped to this route
Behavior — rate limiting, caching, retry overrides, request/response transforms
Route Characteristics:
Customizable — full control over request format, headers, and response parsing. Useful for proprietary models, non-standard APIs, or specialized ML workflows.
Simplified — single endpoint that maps to multiple providers; the Gateway handles format differences. Enables provider switching without application changes.
Auto-provisioned — routes created automatically from provider configuration with sensible defaults, minimizing manual setup.
MCP Servers and Tool Calling
In addition to routing LLM requests, the Gateway provides native support for MCP Servers and Tools, allowing agents to safely interact with external systems, APIs, and internal services.
MCP Servers are registered tool backends that expose capabilities an agent can invoke. Tools are the individual operations each server exposes. When an agent calls a tool, the request is routed through the Gateway rather than executed directly — applying the same guardrails, policies, and observability controls as LLM requests.
MCP endpoint: POST /mcp/{slug}/
Each registered MCP server is accessible at a stable slug URL. For example, a server registered as github is available at /mcp/github/.
Credential modes for MCP servers
MCP servers often require per-user or per-tenant credentials. The Gateway supports three modes — see Credential Modes for full details:
Internal — shared server connection, no per-user credentials
OAuth Passthrough — client provides token; Gateway forwards it unchanged; includes MCP OAuth 2.1 + PKCE discovery
Token Broker — Gateway fetches per-user token from Admin API on each request
MCP traffic handling
All MCP traffic is intercepted and routed through dedicated proxy endpoints. Requests are evaluated using a two-phase model: guardrails and policies applied both before execution (input phase) and after execution (output phase).
Before forwarding any request, the Gateway validates against the MCP Registry to confirm the target server is active and the requested tool is explicitly enabled. Method-specific logic handles tools/call, tools/list, prompts/*, and resources/* with appropriate controls for each.
All MCP requests and responses are recorded with full contextual metadata, creating a complete audit trail for security investigations, compliance, and threat analysis.
MCP protocol support
The Gateway fully implements the MCP Streamable HTTP protocol (RFC 8545), allowing agents to interact with MCP servers using the standard specification without custom adapters or protocol shims.
Guardrails and Policy Enforcement
Shield guardrails run inline on every request and response passing through the Gateway — both LLM traffic and MCP tool calls. Evaluation covers four lifecycle points (user prompt, tool call, tool response, model response). If a Cedar policy produces a deny decision, the Gateway returns 403 and does not forward the request. If Shield is unreachable, the Gateway fails open and logs a warning.
See Integrated Guardrails for evaluation mechanics, route configuration, and enforcement modes. For policy authoring, see the Cedar Cookbook.
Observability
Every request routed through the Gateway generates a trace in Observatory. Traces capture latency breakdowns, token usage, guardrail evaluations, tool invocations, and payload snapshots at each stage. See the Observatory Overview and Traces for investigation surfaces.
Last updated