# Multi-Agent

Multi-agent (MAS) policies enforce trust-tiered access control and cross-turn session safety in orchestrated multi-agent systems — where a central orchestrator coordinates sub-agents that share a common trust context.

***

## How MAS differs from A2A

In an orchestrated multi-agent system, a single orchestrator controls which sub-agents run and in what order. Sub-agents operate within a shared session and do not communicate peer-to-peer. This changes the threat model:

* **Cross-origin attacks** are unlikely — all agents share the orchestrator's trust context
* **Session-level threats** are the primary vector — an attacker who compromises one turn can affect all subsequent turns
* **Autonomous sub-agents** need tighter thresholds — no human oversight means injection and jailbreak signals must be acted on earlier
* **Command injection** is treated as a full-session threat — after a command injection fires, even first-party agents are blocked from shell access

Compare this to [A2A policies](/agent-authorization-and-control-shield/policy-templates/a2a-policies.md), which focus on identity spoofing and confused deputy attacks that arise when agents operate without a central coordinator.

| Dimension                    | MAS                                | A2A                                |
| ---------------------------- | ---------------------------------- | ---------------------------------- |
| Trust establishment          | Orchestrator-mediated              | Self-reported via ZeroID           |
| Primary attack vector        | Cross-turn session escalation      | Identity spoofing, confused deputy |
| Cross-origin rules           | Not included                       | Critical signal (5 rules)          |
| Cumulative risk threshold    | 200                                | 150                                |
| Threat turn lockout          | 5 turns                            | 3 turns                            |
| Post-command-injection scope | All agents (including first-party) | Per-agent restrictions             |
| Autonomous agent thresholds  | Injection/jailbreak lowered to 50  | Blocked if unverified              |

***

## Policy profiles

MAS policies are organized into two profiles located in `schemas/guardrails/templates/profiles/multi_agent/`.

***

### 1. Agent trust

**Profile:** `agent_trust.cedar`

Controls which agents can perform which operations based on their trust level and type. These rules gate access before any content-based detection runs.

#### Dangerous tools — first-party only

Only agents with `agent_trust_level == "first_party"` can invoke tools in the `dangerous` category. Third-party and unverified agents are unconditionally blocked regardless of the payload.

```cedar
@id("multi-agent-only-first-party-dangerous")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has tool_category && context.tool_category == "dangerous" &&
    context has agent_trust_level && context.agent_trust_level != "first_party"
};
```

#### Sensitive tools — verified minimum

Sensitive tools require at minimum `verified_third_party` trust. Unverified agents are blocked from sensitive tool access entirely.

#### Double-unverified MCP block

An unverified agent attempting to connect to an unverified MCP server is blocked. When neither the agent nor the tool server can be attested, the combination is too high risk to allow.

```cedar
@id("multi-agent-block-unverified-mcp")
@severity("high")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has agent_trust_level && context.agent_trust_level == "unverified" &&
    context has mcp_server_verified && context.mcp_server_verified == false
};
```

#### Autonomous agent caps

Autonomous agents — those operating without human oversight in the session loop — are subject to lower thresholds on all detection signals:

| Signal                     | Standard threshold           | Autonomous threshold              |
| -------------------------- | ---------------------------- | --------------------------------- |
| Tool risk ceiling          | Unrestricted for first-party | `<= 70` regardless of trust level |
| Injection confidence block | 80                           | 50                                |
| Jailbreak confidence block | 80                           | 50                                |

```cedar
@id("multi-agent-autonomous-tool-risk-cap")
@severity("high")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has agent_type && context.agent_type == "autonomous" &&
    context has tool_risk_score && context.tool_risk_score > 70
};
```

***

### 2. Agent safety

**Profile:** `agent_safety.cedar`

Session-aware policies that lock down agent capabilities after a threat signal fires in any earlier turn. These rules implement circuit-breaker behavior across the session — a single high-confidence detection changes what every subsequent turn is allowed to do.

#### Post-PII detection: network and file lockdown

After PII is detected anywhere in the session (`session_pii_detected == true`):

* Non-first-party agents are blocked from network tools (`http_post`, `send_email`, `http_request`, `webhook`) — preventing exfiltration of PII that was just observed
* Unverified agents are blocked from file writes — preventing PII from being written to disk and later retrieved

```cedar
@id("multi-agent-pii-block-network-tools")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has session_pii_detected && context.session_pii_detected == true &&
    context has agent_trust_level && context.agent_trust_level != "first_party" &&
    context has tool_name &&
    (context.tool_name == "http_post" || context.tool_name == "send_email" ||
     context.tool_name == "http_request" || context.tool_name == "webhook")
};
```

#### Post-secrets detection: sensitive tool lockdown

After secrets are detected in the session (`session_secrets_detected == true`), non-first-party agents are blocked from all sensitive tools. This prevents a credential observed in one turn from being used by a downstream agent in a later turn.

#### Post-injection detection: unverified agent lockdown

After a prompt injection fires in the session (`session_injection_detected == true`), unverified agents are blocked from all tool calls. An injection that succeeded against one agent in the session cannot be carried forward by another unverified agent.

#### Post-command-injection: full shell lockdown

After command injection fires, **all agents** — including first-party — are blocked from shell access for the remainder of the session. Command injection represents a persistent compromise of the execution environment; the shell is closed until the session ends.

```cedar
@id("multi-agent-post-command-injection-block-shell")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has session_command_injection && context.session_command_injection == true &&
    context has tool_category && context.tool_category == "shell"
};
```

#### Cumulative risk circuit breaker

As risk accumulates across turns, two graduated thresholds apply:

| Threshold     | `session_cumulative_risk_score`       | Effect                                                 |
| ------------- | ------------------------------------- | ------------------------------------------------------ |
| Restriction   | `> 200`                               | Non-first-party agents restricted from sensitive tools |
| Full lockdown | `> 500` or `session_threat_turns > 5` | Unverified agents blocked from all tool calls          |

The 200 threshold is deliberately higher than the A2A equivalent (150) because the orchestrator can intervene between turns, providing an additional layer of containment that A2A systems lack.

***

## Applying MAS profiles

{% tabs %}
{% tab title="Python" %}

```python
from highflame.shield import GuardrailsClient

client = GuardrailsClient(api_key="...")

# Load both MAS profiles
client.policies.load_profile("multi_agent/agent_trust")
client.policies.load_profile("multi_agent/agent_safety")

# Or load the full multi-agent profile bundle
client.policies.load_profile("multi_agent/*")
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import { GuardrailsClient } from "@highflame/sdk";

const client = new GuardrailsClient({ apiKey: "..." });

// Load both MAS profiles
await client.policies.loadProfile("multi_agent/agent_trust");
await client.policies.loadProfile("multi_agent/agent_safety");

// Or load the full multi-agent profile bundle
await client.policies.loadProfile("multi_agent/*");
```

{% endtab %}
{% endtabs %}

Profiles can also be applied in **Highflame Studio** → **Shield** → **Policies** → **Apply profile**.

### Recommended rollout

`agent_trust` rules (tier-gating by trust level) have near-zero false positive rates and can be applied in Block mode immediately. `agent_safety` session circuit-breakers should be run in Monitor mode for one week to review detection rates across your session population before switching to Block.

| Profile        | Recommended initial mode     |
| -------------- | ---------------------------- |
| `agent_trust`  | Block                        |
| `agent_safety` | Monitor → Block after review |

***

## Passing agent identity in context

MAS policies evaluate the `agent_trust_level`, `agent_type`, and `agent_id` context fields on every request. These fields are projected automatically from the ZeroID JWT when agents authenticate through the Agent Gateway or Shield SDK. For agents that do not use the gateway, set them explicitly:

{% tabs %}
{% tab title="Python" %}

```python
result = await client.guardrails.evaluate(
    messages=messages,
    context={
        "agent_id": "agent:acme:summarizer:v2",
        "agent_type": "tool_agent",
        "agent_trust_level": "verified_third_party",
    }
)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
const result = await client.guardrails.evaluate({
    messages,
    context: {
        agent_id: "agent:acme:summarizer:v2",
        agent_type: "tool_agent",
        agent_trust_level: "verified_third_party",
    }
});
```

{% endtab %}
{% endtabs %}

If `agent_trust_level` is not provided, the evaluation engine defaults to `unverified`. Any policy that restricts unverified agents will apply.

***

## Related

* [A2A Policies](/agent-authorization-and-control-shield/policy-templates/a2a-policies.md) — policies for independent peer-to-peer agent communication
* [Policy Templates](/agent-authorization-and-control-shield/policy-templates.md) — all available policy profiles and how to combine them
* [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) — writing custom rules for multi-agent scenarios
* [Agent Identity (ZeroID)](/agent-identity-zeroid/introduction.md) — issuing and managing agent identities
* [Agent Delegation](/agent-identity-zeroid/guides/agent-delegation.md) — scoping delegated identity in multi-agent workflows


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-authorization-and-control-shield/policy-templates/multi-agent-policies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
