# Multi-Protocol Gateway

The Highflame Agent Gateway is the unified security gateway for AI, MCP, and agent-to-agent traffic. It sits between your agents and downstream model providers and tool servers, acting as an intelligent proxy that routes requests, enforces guardrails and Cedar policies, and captures the telemetry needed for observability and governance.

Rather than integrating directly with individual model APIs or MCP servers, you route all agent traffic through the Gateway. This lets Highflame apply consistent guardrails, collect traces, and enforce policy across all models, providers, MCP servers, and environments — without changes to application logic as your stack evolves.

***

### LLM Providers

An LLM Provider defines how the Gateway communicates with an underlying AI service. It encapsulates the configuration needed to authenticate, connect, and send requests to a specific model source — commercial LLM APIs, self-hosted models, or internal inference services.

Credentials (API keys, tokens) are stored securely in Highflame's Secrets Vault. By centralizing this configuration, providers make it easy to manage multiple model sources and switch between them without touching application code.

**Supported providers:**

| Provider             | Model format example              |
| -------------------- | --------------------------------- |
| OpenAI               | `openai/gpt-4o`                   |
| Anthropic            | `anthropic/claude-sonnet-4-6`     |
| Azure OpenAI         | `azure/gpt-4o`                    |
| Google Gemini        | `gemini/gemini-2.0-flash`         |
| Groq                 | `groq/llama-3.3-70b-versatile`    |
| Mistral              | `mistral/mistral-large`           |
| Ollama (self-hosted) | `ollama/llama3.2`                 |
| Together AI          | `together/meta-llama/Llama-3-70b` |
| DeepSeek             | `deepseek/deepseek-chat`          |

Models are specified as `provider/model` in route configuration. The Gateway handles request transformation, authentication, and structural differences between providers automatically.

**Exposed API endpoints:**

The Gateway exposes OpenAI-compatible endpoints so existing agent SDKs work without modification:

| Endpoint                        | Purpose                          |
| ------------------------------- | -------------------------------- |
| `POST /v1/chat/completions`     | Chat completions (all providers) |
| `POST /v1/responses`            | OpenAI Responses API             |
| `POST /v1/embeddings`           | Text embeddings                  |
| `POST /v1/images/generations`   | Image generation                 |
| `POST /v1/audio/speech`         | Text-to-speech                   |
| `POST /v1/audio/transcriptions` | Speech-to-text                   |
| `POST /v1/audio/translations`   | Audio translation                |
| `POST /v1/moderations`          | Content moderation               |

**Retry behavior:** The Gateway automatically retries on `429` and `5xx` responses with exponential backoff, up to 3 attempts.

***

### Routes

A Route defines how a specific request class is handled. Routes are named, reusable configurations that determine which providers and models are used, which guardrails and Cedar policies are enforced, and how requests and responses are processed.

When an application sends a request to the Gateway, it specifies a route. The Gateway uses that route's configuration to apply the appropriate controls, select the correct model or provider, and execute any additional behavior.

**Route configuration fields:**

* **Path** — unique name that becomes part of the API endpoint
* **Models and Providers** — which models handle the request; specified as `provider/model`
* **Policies** — Cedar policies scoped to this route
* **Behavior** — rate limiting, caching, retry overrides, request/response transforms

**Route Characteristics:**

* **Customizable** — full control over request format, headers, and response parsing. Useful for proprietary models, non-standard APIs, or specialized ML workflows.
* **Simplified** — single endpoint that maps to multiple providers; the Gateway handles format differences. Enables provider switching without application changes.
* **Auto-provisioned** — routes created automatically from provider configuration with sensible defaults, minimizing manual setup.

***

### MCP Servers and Tool Calling

In addition to routing LLM requests, the Gateway provides native support for MCP Servers and Tools, allowing agents to safely interact with external systems, APIs, and internal services.

**MCP Servers** are registered tool backends that expose capabilities an agent can invoke. **Tools** are the individual operations each server exposes. When an agent calls a tool, the request is routed through the Gateway rather than executed directly — applying the same guardrails, policies, and observability controls as LLM requests.

**MCP endpoint:** `POST /mcp/{slug}/`

Each registered MCP server is accessible at a stable slug URL. For example, a server registered as `github` is available at `/mcp/github/`.

#### Credential modes for MCP servers

MCP servers often require per-user or per-tenant credentials. The Gateway supports three modes — see [Credential Modes](/agent-gateway/credential-modes.md) for full details:

* **Internal** — shared server connection, no per-user credentials
* **OAuth Passthrough** — client provides token; Gateway forwards it unchanged; includes MCP OAuth 2.1 + PKCE discovery
* **Token Broker** — Gateway fetches per-user token from Admin API on each request

#### MCP traffic handling

All MCP traffic is intercepted and routed through dedicated proxy endpoints. Requests are evaluated using a two-phase model: guardrails and policies applied both before execution (input phase) and after execution (output phase).

Before forwarding any request, the Gateway validates against the MCP Registry to confirm the target server is active and the requested tool is explicitly enabled. Method-specific logic handles `tools/call`, `tools/list`, `prompts/*`, and `resources/*` with appropriate controls for each.

All MCP requests and responses are recorded with full contextual metadata, creating a complete audit trail for security investigations, compliance, and threat analysis.

#### MCP protocol support

The Gateway fully implements the MCP Streamable HTTP protocol (RFC 8545), allowing agents to interact with MCP servers using the standard specification without custom adapters or protocol shims.

***

### Guardrails and Policy Enforcement

Shield guardrails run inline on every request and response passing through the Gateway — both LLM traffic and MCP tool calls. Evaluation covers four lifecycle points (user prompt, tool call, tool response, model response). If a Cedar policy produces a `deny` decision, the Gateway returns `403` and does not forward the request. If Shield is unreachable, the Gateway fails open and logs a warning.

See [Integrated Guardrails](/agent-gateway/agent-gateway.md) for evaluation mechanics, route configuration, and enforcement modes. For policy authoring, see the [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md).

***

### Observability

Every request routed through the Gateway generates a trace in Observatory. Traces capture latency breakdowns, token usage, guardrail evaluations, tool invocations, and payload snapshots at each stage. See the [Observatory Overview](/observatory/observatory.md) and [Traces](/observatory/traces.md) for investigation surfaces.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-gateway/ai-gateway.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.