Multi-Protocol Gateway

The Highflame Agent Gateway — unified LLM proxy, MCP routing, guardrails, and observability for all agent traffic.

The Highflame Agent Gateway is the unified security gateway for AI, MCP, and agent-to-agent traffic. It sits between your agents and downstream model providers and tool servers, acting as an intelligent proxy that routes requests, enforces guardrails and Cedar policies, and captures the telemetry needed for observability and governance.

Rather than integrating directly with individual model APIs or MCP servers, you route all agent traffic through the Gateway. This lets Highflame apply consistent guardrails, collect traces, and enforce policy across all models, providers, MCP servers, and environments — without changes to application logic as your stack evolves.


LLM Providers

An LLM Provider defines how the Gateway communicates with an underlying AI service. It encapsulates the configuration needed to authenticate, connect, and send requests to a specific model source — commercial LLM APIs, self-hosted models, or internal inference services.

Credentials (API keys, tokens) are stored securely in Highflame's Secrets Vault. By centralizing this configuration, providers make it easy to manage multiple model sources and switch between them without touching application code.

Supported providers:

Provider
Model format example

OpenAI

openai/gpt-4o

Anthropic

anthropic/claude-sonnet-4-6

Azure OpenAI

azure/gpt-4o

Google Gemini

gemini/gemini-2.0-flash

Groq

groq/llama-3.3-70b-versatile

Mistral

mistral/mistral-large

Ollama (self-hosted)

ollama/llama3.2

Together AI

together/meta-llama/Llama-3-70b

DeepSeek

deepseek/deepseek-chat

Models are specified as provider/model in route configuration. The Gateway handles request transformation, authentication, and structural differences between providers automatically.

Exposed API endpoints:

The Gateway exposes OpenAI-compatible endpoints so existing agent SDKs work without modification:

Endpoint
Purpose

POST /v1/chat/completions

Chat completions (all providers)

POST /v1/responses

OpenAI Responses API

POST /v1/embeddings

Text embeddings

POST /v1/images/generations

Image generation

POST /v1/audio/speech

Text-to-speech

POST /v1/audio/transcriptions

Speech-to-text

POST /v1/audio/translations

Audio translation

POST /v1/moderations

Content moderation

Retry behavior: The Gateway automatically retries on 429 and 5xx responses with exponential backoff, up to 3 attempts.


Routes

A Route defines how a specific request class is handled. Routes are named, reusable configurations that determine which providers and models are used, which guardrails and Cedar policies are enforced, and how requests and responses are processed.

When an application sends a request to the Gateway, it specifies a route. The Gateway uses that route's configuration to apply the appropriate controls, select the correct model or provider, and execute any additional behavior.

Route configuration fields:

  • Path — unique name that becomes part of the API endpoint

  • Models and Providers — which models handle the request; specified as provider/model

  • Policies — Cedar policies scoped to this route

  • Behavior — rate limiting, caching, retry overrides, request/response transforms

Route Characteristics:

  • Customizable — full control over request format, headers, and response parsing. Useful for proprietary models, non-standard APIs, or specialized ML workflows.

  • Simplified — single endpoint that maps to multiple providers; the Gateway handles format differences. Enables provider switching without application changes.

  • Auto-provisioned — routes created automatically from provider configuration with sensible defaults, minimizing manual setup.


MCP Servers and Tool Calling

In addition to routing LLM requests, the Gateway provides native support for MCP Servers and Tools, allowing agents to safely interact with external systems, APIs, and internal services.

MCP Servers are registered tool backends that expose capabilities an agent can invoke. Tools are the individual operations each server exposes. When an agent calls a tool, the request is routed through the Gateway rather than executed directly — applying the same guardrails, policies, and observability controls as LLM requests.

MCP endpoint: POST /mcp/{slug}/

Each registered MCP server is accessible at a stable slug URL. For example, a server registered as github is available at /mcp/github/.

Credential modes for MCP servers

MCP servers often require per-user or per-tenant credentials. The Gateway supports three modes — see Credential Modes for full details:

  • Internal — shared server connection, no per-user credentials

  • OAuth Passthrough — client provides token; Gateway forwards it unchanged; includes MCP OAuth 2.1 + PKCE discovery

  • Token Broker — Gateway fetches per-user token from Admin API on each request

MCP traffic handling

All MCP traffic is intercepted and routed through dedicated proxy endpoints. Requests are evaluated using a two-phase model: guardrails and policies applied both before execution (input phase) and after execution (output phase).

Before forwarding any request, the Gateway validates against the MCP Registry to confirm the target server is active and the requested tool is explicitly enabled. Method-specific logic handles tools/call, tools/list, prompts/*, and resources/* with appropriate controls for each.

All MCP requests and responses are recorded with full contextual metadata, creating a complete audit trail for security investigations, compliance, and threat analysis.

MCP protocol support

The Gateway fully implements the MCP Streamable HTTP protocol (RFC 8545), allowing agents to interact with MCP servers using the standard specification without custom adapters or protocol shims.


Guardrails and Policy Enforcement

Shield guardrails run inline on every request and response passing through the Gateway — both LLM traffic and MCP tool calls. Evaluation covers four lifecycle points (user prompt, tool call, tool response, model response). If a Cedar policy produces a deny decision, the Gateway returns 403 and does not forward the request. If Shield is unreachable, the Gateway fails open and logs a warning.

See Integrated Guardrails for evaluation mechanics, route configuration, and enforcement modes. For policy authoring, see the Cedar Cookbook.


Observability

Every request routed through the Gateway generates a trace in Observatory. Traces capture latency breakdowns, token usage, guardrail evaluations, tool invocations, and payload snapshots at each stage. See the Observatory Overview and Traces for investigation surfaces.

Last updated