Securing Model Calls
Every LLM request routed through the Agent Gateway passes through a layered set of security controls — before the request reaches the model provider, during provider interaction, and before the response is returned to the caller. This page documents each control layer.
Provider credential isolation
Applications never handle model provider API keys. When you register a provider in Highflame Studio, the API key is stored in the Secrets Vault (backed by AWS Secrets Manager with KMS encryption at rest). Applications reference a named route; the Gateway resolves the route to a provider and retrieves the credential from the vault at request time.
This means:
A compromised agent or application cannot leak the underlying model API key — it never has access to it
Rotating a provider key requires updating it in one place (the vault), not across every application
If the secrets detector fires on an inbound request, it catches an attempt to exfiltrate a key that the caller should not have in the first place
Caller authentication
Every request to the Gateway must present a valid credential before any model call is initiated. Authentication is resolved in the following priority order:
1
Highflame API key
X-Highflame-APIKey: hf_sk_...
2
Service key (bearer)
Authorization: Bearer hf_sk_...
3
ZeroID JWT (OAuth)
Authorization: Bearer eyJ... (RS256-signed)
Unauthenticated requests receive a 401 before any downstream processing occurs. JWT tokens are validated against the Highflame JWKS endpoint; signing keys are refreshed every 5 minutes with lazy refresh on unknown key IDs. Token claims carry the caller's account_id, project_id, and agent identity, which are propagated into guardrail context and Observatory traces.
Agents using ZeroID issue short-lived JWTs through the highflame-authn service. These tokens expire quickly, limiting the blast radius of a stolen credential and enabling continuous access evaluation (CAE) for real-time revocation.
Model pinning
Callers specify a route name, not a model. The route configuration — stored in Highflame Studio and controlled by administrators — determines which provider and model handle the request. Applications cannot override this at call time.
This prevents:
Model substitution attacks — an agent cannot be manipulated into routing a sensitive request to a different, less-safe model
Provider drift — model selection is an admin decision, not a developer decision, and changes are audited
Prompt-based model switching — even if an injected prompt attempts to redirect to a different model endpoint, the Gateway ignores it and uses the route's configured model
To change which model a route uses, an admin updates the route configuration in Studio. No application changes are required, and the change is reflected on the next request.
Input evaluation
Before any request is forwarded to the model provider, it passes through the Shield detection pipeline. Detection runs in three tiers ordered by latency:
Tier 1 — Fast (< 5 ms, deterministic)
Runs on every request with no latency budget impact:
Secrets
API keys, bearer tokens, SSH keys, PEM certificates, cloud credentials (AWS, GCP, Azure, GitHub)
PII
Email addresses, phone numbers, SSNs, credit card numbers, IP addresses
Security patterns
SQL injection, XSS, path traversal, command injection, invisible Unicode characters
Keyword matching
Aho-Corasick matching against configured blocklists
Tool risk scoring
Scores tool call arguments (0–100) and flags sensitive tool categories
Action pattern
State-machine detection of multi-step attack sequences (e.g., read file → HTTP post)
Loop detection
Detects repeated identical tool calls above the configured threshold (default: 5)
Token budget
Checks session token consumption against per-session limits
Tier 2 — Standard (10–200 ms, ML/NLP)
Runs on requests where Tier 1 does not early-exit:
Injection
ML model for prompt injection and jailbreak attempts (produces injection_confidence 0–100)
Deep context
Multi-turn injection detection using conversation history across the session
Toxicity
Violence, hate speech, sexual content, criminal activity, weapons, profanity
Language
Language identification across 75 languages
Sentiment
Sentiment analysis for downstream policy use
Tier 3 — Slow (50–500 ms, cloud APIs)
Runs when elevated risk signals warrant deeper inspection:
Enterprise DLP
Google Cloud DLP for regulated PII detection with fuzzy matching
Content Safety
Google Model Armor content safety API
Phishing
CheckPhish URL scanning for malicious links in prompts
Early exit: If Tier 1 produces a deny decision and the Cedar policy does not require further evidence, Tier 2 and 3 are skipped. This keeps the median latency impact under 5 ms for requests that are cleanly safe or clearly blocked.
Cedar policy enforcement
Detector outputs are projected into Cedar context keys (injection_confidence, pii_detected, contains_secrets, tool_risk_score, etc.) and evaluated against your Cedar policies. The policy decision determines the enforcement action:
Block
Request is rejected with 403; violation recorded in Observatory
Redact
Detected content is masked before the request is forwarded to the model
Alert
Request is forwarded; a detection event is emitted for review
Monitor
Request is forwarded; detection is recorded silently with no user notification
Session-aware context
The detection pipeline is session-aware. Detection signals and action history are persisted in Redis across turns, so policies can respond to the cumulative state of a conversation:
Session state includes:
session_pii_detected— whether PII has appeared in any turnsession_secrets_detected— whether secrets have appeared in any turnsession_injection_detected— whether an injection signal has fired in any turnsession_cumulative_risk_score— running total of risk contributions across all turnssession_threat_turns— number of turns with at least one threat signalsession_max_injection_score— peak injection score seen in any turn
This enables circuit-breaker policies that lock down sensitive operations after a threshold of risk accumulates — see Abuse Detection and Control for patterns.
Output evaluation
Model responses are evaluated by the same Shield pipeline before being returned to the caller. Output-phase detection checks for:
PII in response
Personal information generated or repeated by the model
Secrets in response
Credentials the model may have retrieved or hallucinated
Hallucination
Factual inconsistency scored by the hallucination detector
Phishing links
Malicious URLs in model-generated content
Toxicity
Harmful content in the model's output
If the output evaluation produces a deny, the response is blocked and the caller receives a 403 — the model's response is never returned. If redact applies, sensitive content is masked before the response leaves the Gateway.
Rate limiting and token budgets
Provider rate limits: The Gateway automatically retries on 429 (rate limited) and 5xx responses from model providers using exponential backoff, up to 3 attempts. This is transparent to callers.
Per-session token budgets: Token consumption is tracked across turns in the session. When a session exceeds its configured budget, the budget_exceeded flag fires and Cedar policies can block further requests. This prevents runaway sessions from consuming unbounded model API spend.
Cost tracking: Every model call is recorded with token counts and cost attribution in Observatory. The Command Center cost intelligence view surfaces per-agent and per-model cost breakdowns in real time.
Audit trail
Every model call generates a trace in Observatory containing:
Caller identity (agent ID, user, application)
Route and provider/model used
Request timestamp and end-to-end latency
Shield evaluation results: detector scores, Cedar policy decisions, determining policies
Token usage (input and output tokens)
Cost attribution
Full payload snapshot at each evaluation point (subject to your data handling policy)
Traces are retained for 30 days. Detection events are retained for 90 days. See Traces and Threats in Observatory for investigation surfaces.
Related
Multi-Protocol Gateway — provider configuration, routes, and MCP overview
Integrated Guardrails — Cedar policy configuration for gateway traffic
Credential Modes — per-user credential handling for MCP servers
Policy Templates — pre-built Cedar profiles for common deployment types
Abuse Detection and Control — loop detection, budget controls, session circuit breakers
Last updated