Multi-Agent

Multi-agent (MAS) policies enforce trust-tiered access control and cross-turn session safety in orchestrated multi-agent systems — where a central orchestrator coordinates sub-agents that share a common trust context.


How MAS differs from A2A

In an orchestrated multi-agent system, a single orchestrator controls which sub-agents run and in what order. Sub-agents operate within a shared session and do not communicate peer-to-peer. This changes the threat model:

  • Cross-origin attacks are unlikely — all agents share the orchestrator's trust context

  • Session-level threats are the primary vector — an attacker who compromises one turn can affect all subsequent turns

  • Autonomous sub-agents need tighter thresholds — no human oversight means injection and jailbreak signals must be acted on earlier

  • Command injection is treated as a full-session threat — after a command injection fires, even first-party agents are blocked from shell access

Compare this to A2A policies, which focus on identity spoofing and confused deputy attacks that arise when agents operate without a central coordinator.

Dimension
MAS
A2A

Trust establishment

Orchestrator-mediated

Self-reported via ZeroID

Primary attack vector

Cross-turn session escalation

Identity spoofing, confused deputy

Cross-origin rules

Not included

Critical signal (5 rules)

Cumulative risk threshold

200

150

Threat turn lockout

5 turns

3 turns

Post-command-injection scope

All agents (including first-party)

Per-agent restrictions

Autonomous agent thresholds

Injection/jailbreak lowered to 50

Blocked if unverified


Policy profiles

MAS policies are organized into two profiles located in schemas/guardrails/templates/profiles/multi_agent/.


1. Agent trust

Profile: agent_trust.cedar

Controls which agents can perform which operations based on their trust level and type. These rules gate access before any content-based detection runs.

Dangerous tools — first-party only

Only agents with agent_trust_level == "first_party" can invoke tools in the dangerous category. Third-party and unverified agents are unconditionally blocked regardless of the payload.

Sensitive tools — verified minimum

Sensitive tools require at minimum verified_third_party trust. Unverified agents are blocked from sensitive tool access entirely.

Double-unverified MCP block

An unverified agent attempting to connect to an unverified MCP server is blocked. When neither the agent nor the tool server can be attested, the combination is too high risk to allow.

Autonomous agent caps

Autonomous agents — those operating without human oversight in the session loop — are subject to lower thresholds on all detection signals:

Signal
Standard threshold
Autonomous threshold

Tool risk ceiling

Unrestricted for first-party

<= 70 regardless of trust level

Injection confidence block

80

50

Jailbreak confidence block

80

50


2. Agent safety

Profile: agent_safety.cedar

Session-aware policies that lock down agent capabilities after a threat signal fires in any earlier turn. These rules implement circuit-breaker behavior across the session — a single high-confidence detection changes what every subsequent turn is allowed to do.

Post-PII detection: network and file lockdown

After PII is detected anywhere in the session (session_pii_detected == true):

  • Non-first-party agents are blocked from network tools (http_post, send_email, http_request, webhook) — preventing exfiltration of PII that was just observed

  • Unverified agents are blocked from file writes — preventing PII from being written to disk and later retrieved

Post-secrets detection: sensitive tool lockdown

After secrets are detected in the session (session_secrets_detected == true), non-first-party agents are blocked from all sensitive tools. This prevents a credential observed in one turn from being used by a downstream agent in a later turn.

Post-injection detection: unverified agent lockdown

After a prompt injection fires in the session (session_injection_detected == true), unverified agents are blocked from all tool calls. An injection that succeeded against one agent in the session cannot be carried forward by another unverified agent.

Post-command-injection: full shell lockdown

After command injection fires, all agents — including first-party — are blocked from shell access for the remainder of the session. Command injection represents a persistent compromise of the execution environment; the shell is closed until the session ends.

Cumulative risk circuit breaker

As risk accumulates across turns, two graduated thresholds apply:

Threshold

session_cumulative_risk_score

Effect

Restriction

> 200

Non-first-party agents restricted from sensitive tools

Full lockdown

> 500 or session_threat_turns > 5

Unverified agents blocked from all tool calls

The 200 threshold is deliberately higher than the A2A equivalent (150) because the orchestrator can intervene between turns, providing an additional layer of containment that A2A systems lack.


Applying MAS profiles

Profiles can also be applied in Highflame StudioShieldPoliciesApply profile.

agent_trust rules (tier-gating by trust level) have near-zero false positive rates and can be applied in Block mode immediately. agent_safety session circuit-breakers should be run in Monitor mode for one week to review detection rates across your session population before switching to Block.

Profile
Recommended initial mode

agent_trust

Block

agent_safety

Monitor → Block after review


Passing agent identity in context

MAS policies evaluate the agent_trust_level, agent_type, and agent_id context fields on every request. These fields are projected automatically from the ZeroID JWT when agents authenticate through the Agent Gateway or Shield SDK. For agents that do not use the gateway, set them explicitly:

If agent_trust_level is not provided, the evaluation engine defaults to unverified. Any policy that restricts unverified agents will apply.


Last updated