Multi-Agent
Multi-agent (MAS) policies enforce trust-tiered access control and cross-turn session safety in orchestrated multi-agent systems — where a central orchestrator coordinates sub-agents that share a common trust context.
How MAS differs from A2A
In an orchestrated multi-agent system, a single orchestrator controls which sub-agents run and in what order. Sub-agents operate within a shared session and do not communicate peer-to-peer. This changes the threat model:
Cross-origin attacks are unlikely — all agents share the orchestrator's trust context
Session-level threats are the primary vector — an attacker who compromises one turn can affect all subsequent turns
Autonomous sub-agents need tighter thresholds — no human oversight means injection and jailbreak signals must be acted on earlier
Command injection is treated as a full-session threat — after a command injection fires, even first-party agents are blocked from shell access
Compare this to A2A policies, which focus on identity spoofing and confused deputy attacks that arise when agents operate without a central coordinator.
Trust establishment
Orchestrator-mediated
Self-reported via ZeroID
Primary attack vector
Cross-turn session escalation
Identity spoofing, confused deputy
Cross-origin rules
Not included
Critical signal (5 rules)
Cumulative risk threshold
200
150
Threat turn lockout
5 turns
3 turns
Post-command-injection scope
All agents (including first-party)
Per-agent restrictions
Autonomous agent thresholds
Injection/jailbreak lowered to 50
Blocked if unverified
Policy profiles
MAS policies are organized into two profiles located in schemas/guardrails/templates/profiles/multi_agent/.
1. Agent trust
Profile: agent_trust.cedar
Controls which agents can perform which operations based on their trust level and type. These rules gate access before any content-based detection runs.
Dangerous tools — first-party only
Only agents with agent_trust_level == "first_party" can invoke tools in the dangerous category. Third-party and unverified agents are unconditionally blocked regardless of the payload.
Sensitive tools — verified minimum
Sensitive tools require at minimum verified_third_party trust. Unverified agents are blocked from sensitive tool access entirely.
Double-unverified MCP block
An unverified agent attempting to connect to an unverified MCP server is blocked. When neither the agent nor the tool server can be attested, the combination is too high risk to allow.
Autonomous agent caps
Autonomous agents — those operating without human oversight in the session loop — are subject to lower thresholds on all detection signals:
Tool risk ceiling
Unrestricted for first-party
<= 70 regardless of trust level
Injection confidence block
80
50
Jailbreak confidence block
80
50
2. Agent safety
Profile: agent_safety.cedar
Session-aware policies that lock down agent capabilities after a threat signal fires in any earlier turn. These rules implement circuit-breaker behavior across the session — a single high-confidence detection changes what every subsequent turn is allowed to do.
Post-PII detection: network and file lockdown
After PII is detected anywhere in the session (session_pii_detected == true):
Non-first-party agents are blocked from network tools (
http_post,send_email,http_request,webhook) — preventing exfiltration of PII that was just observedUnverified agents are blocked from file writes — preventing PII from being written to disk and later retrieved
Post-secrets detection: sensitive tool lockdown
After secrets are detected in the session (session_secrets_detected == true), non-first-party agents are blocked from all sensitive tools. This prevents a credential observed in one turn from being used by a downstream agent in a later turn.
Post-injection detection: unverified agent lockdown
After a prompt injection fires in the session (session_injection_detected == true), unverified agents are blocked from all tool calls. An injection that succeeded against one agent in the session cannot be carried forward by another unverified agent.
Post-command-injection: full shell lockdown
After command injection fires, all agents — including first-party — are blocked from shell access for the remainder of the session. Command injection represents a persistent compromise of the execution environment; the shell is closed until the session ends.
Cumulative risk circuit breaker
As risk accumulates across turns, two graduated thresholds apply:
Threshold
session_cumulative_risk_score
Effect
Restriction
> 200
Non-first-party agents restricted from sensitive tools
Full lockdown
> 500 or session_threat_turns > 5
Unverified agents blocked from all tool calls
The 200 threshold is deliberately higher than the A2A equivalent (150) because the orchestrator can intervene between turns, providing an additional layer of containment that A2A systems lack.
Applying MAS profiles
Profiles can also be applied in Highflame Studio → Shield → Policies → Apply profile.
Recommended rollout
agent_trust rules (tier-gating by trust level) have near-zero false positive rates and can be applied in Block mode immediately. agent_safety session circuit-breakers should be run in Monitor mode for one week to review detection rates across your session population before switching to Block.
agent_trust
Block
agent_safety
Monitor → Block after review
Passing agent identity in context
MAS policies evaluate the agent_trust_level, agent_type, and agent_id context fields on every request. These fields are projected automatically from the ZeroID JWT when agents authenticate through the Agent Gateway or Shield SDK. For agents that do not use the gateway, set them explicitly:
If agent_trust_level is not provided, the evaluation engine defaults to unverified. Any policy that restricts unverified agents will apply.
Related
A2A Policies — policies for independent peer-to-peer agent communication
Policy Templates — all available policy profiles and how to combine them
Cedar Cookbook — writing custom rules for multi-agent scenarios
Agent Identity (ZeroID) — issuing and managing agent identities
Agent Delegation — scoping delegated identity in multi-agent workflows
Last updated