A2A
Agent-to-Agent (A2A) policies enforce trust boundaries in multi-agent systems where agents communicate peer-to-peer rather than through a central orchestrator. They extend the standard Guardrails policy model with identity-aware rules that account for the unique threat surface of inter-agent communication.
Why A2A policies are different
In a traditional orchestrated multi-agent system (MAS), a central orchestrator validates sub-agents and controls their access. A2A systems are different: agents operate independently, each agent self-reports its identity, and agents may receive inputs from other agents they do not fully control.
This creates attack vectors that standard prompt and tool guardrails do not address:
Identity spoofing
An agent claims a trust level or type it does not have, gaining elevated access
Indirect injection
A malicious payload is embedded in another agent's output and consumed downstream
Confused deputy
An agent is manipulated into making cross-origin requests outside its intended trust domain
Supply chain attacks
Tool descriptions in MCP servers are poisoned to redirect agent behavior
Behavioral drift (rug pull)
An agent behaves legitimately until it has established enough trust, then changes behavior
Session escalation
Low-severity signals accumulate across a session until the risk threshold is exceeded
A2A policies detect and enforce against all of these at Cedar policy evaluation time, before any tool execution or response is returned.
Trust model
Every A2A policy evaluation is grounded in the calling agent's identity, which flows from a ZeroID JWT into the Cedar evaluation context. Three fields determine how restrictive the applicable policies are:
Trust level
first_party
Agent owned and operated by your organization
Your own agents registered in ZeroID
verified_third_party
Independently verified agent from an audited publisher
Verified partner agents with ZeroID attestation
unverified
No verification — identity is self-asserted only
External agents, community models, unknown callers
Agent type
autonomous
Operates without human oversight — highest risk, most restricted
tool_agent
Executes specific tools under bounded scope
human_proxy
Acts on behalf of a human in the session loop
orchestrator
Routes to sub-agents; subject to orchestrator-specific rules
Agent framework
The framework declaration (claude-code, langchain, crewai, etc.) is required for any unverified agent requesting sensitive operations. Missing framework + unverified trust = blocked.
How identity reaches Cedar
ZeroID issues a JWT containing agent identity claims. The Shield SDK and Agent Gateway project these claims into the Cedar evaluation context automatically:
These fields are available in every Cedar policy as context.agent_id, context.agent_trust_level, etc.
Policy profiles
A2A policies are organized into five profiles. Each profile can be applied independently. All five are available in highflame-policy under schemas/guardrails/templates/profiles/a2a_security/.
1. Identity enforcement
Profile: identity_enforcement.cedar
Prevents agents from operating with incomplete or inconsistent identity claims. Incomplete identity is treated as a spoofing indicator — not as a benign configuration gap.
Anonymous agent
agent_type present, agent_id empty
Tool calls
Unregistered framework
unverified + no agent_framework
Sensitive tools
Unverified MCP connection
agent_trust_level == "unverified"
connect_server
Autonomous + unverified
Both conditions simultaneously
All tool calls
The most critical rule is the last one: autonomous agents with unverified trust are unconditionally blocked from all tool execution. This combination — no human oversight and no identity attestation — is treated as too high a risk to allow regardless of payload content.
2. Indirect injection detection
Profile: inter_agent_injection.cedar
Detects injection payloads that arrive through another agent's outputs rather than directly from a user. This is the most common A2A-specific attack vector: a downstream agent consumes tool results or RAG retrievals that have been poisoned by an upstream source the receiving agent does not control.
The indirect_injection_score context field is produced by a deep-context detector that analyzes multi-turn patterns across the session, including encoded payloads (base64, hex) that attempt to bypass text-based filters.
General indirect injection
indirect_injection_score >= 60
Tool calls
Sensitive tools with weaker signal
indirect_injection_score >= 40
High-risk tools only
Multi-turn progressive attack
Session GRU model flags escalation
Tool calls + prompts
Encoded payload
Encoding pattern detected
All actions
3. Cross-origin (confused deputy)
Profile: cross_origin.cedar
A confused deputy attack manipulates an agent into proxying requests across trust boundaries — for example, using an agent's credentials to access an internal resource on behalf of an external caller, or forwarding a response to a domain outside the agent's intended scope.
The cross_origin_score context field scores the degree of cross-origin mixing in the request, from mixed HTTP/HTTPS schemes (score: 70) up to direct URL injection in tool parameters (score: 85) or localhost-plus-external mixing (score: 90).
Critical cross-origin
>= 80
All agents, all actions
Moderate cross-origin for unverified agents
>= 60
unverified agents only
Server connection with cross-origin signal
>= 65
connect_server action
Sensitive tools with cross-origin present
>= 60
High-risk tools
4. Supply chain protection
Profile: supply_chain.cedar
Covers attacks targeting the tool ecosystem that agents rely on, specifically:
Tool poisoning — hidden instructions embedded in tool descriptions, metadata, or system prompts. Because tool descriptions are loaded by the agent (not typed by a user), poisoned descriptions can redirect agent behavior without any visible prompt manipulation. The tool_poisoning_score field is produced by a description-aware detector.
Rug pull (behavioral drift) — an agent or tool behaves correctly until it has accumulated enough trust or context, then pivots to a malicious objective. The rug_pull_detected flag and rug_pull_score capture sudden deviations in behavior pattern.
Credential theft chains — multi-step sequences: read a credential file, encode it, then exfiltrate via a tool call to an external endpoint. The suspicious_pattern and pattern_type context fields identify these sequences.
Tool poisoning
tool_poisoning_score >= 60, non-first-party
Tool calls
Server poisoning (lower threshold)
tool_poisoning_score >= 55, non-first-party
connect_server
Rug pull
rug_pull_score >= 70
Tool calls
Credential theft chain
pattern_type == "credential_theft", non-first-party
Tool calls
5. Session escalation detection
Profile: escalation_detection.cedar
Monitors session-level risk scores rather than per-request scores. This catches attacks that distribute their payload across multiple low-scoring turns to stay below per-request thresholds.
Three session aggregate fields drive these policies:
session_max_injection_score
Peak injection score observed in the session so far
session_cumulative_risk_score
Running total of risk contributions across all turns
session_threat_turns
Number of turns where at least one threat signal fired
The thresholds are deliberately lower than the equivalent MAS thresholds, because A2A sessions lack a central orchestrator that can intervene between turns.
Session injection peak
session_max_injection_score >= 70, non-first-party
Tool calls + prompts
Session jailbreak peak
session_max_jailbreak_score >= 70, non-first-party
Tool calls + prompts
Cumulative risk circuit breaker
session_cumulative_risk_score > 150
Sensitive tools
Threat turn lockout
session_threat_turns >= 3, unverified
All tool calls
Applying A2A policy profiles
A2A profiles are Cedar policy files. Apply them in the same way as any other guardrail policy — load them into the policy engine alongside your existing rules.
Profiles can also be applied in Highflame Studio → Shield → Policies → Apply profile. Changes take effect on the next request without a redeploy.
Recommended rollout
Start with the identity enforcement and supply chain profiles — both have very low false positive rates on legitimate traffic. Enable indirect injection and cross-origin in Monitor mode for one week to review detections before switching to Block. Enable session escalation detection last, after you have calibrated the per-request thresholds.
Identity enforcement
Block
Supply chain
Block
Cross-origin
Monitor → Block after review
Indirect injection
Monitor → Block after review
Session escalation
Monitor → Block after review
Comparing A2A and MAS policy thresholds
A2A policies use tighter thresholds than the equivalent MAS policies because there is no central orchestrator to catch what slips through:
Session cumulative risk
150
200
Session threat turns (lockout)
3
5
Session injection peak
70
80
Tool poisoning (non-first-party)
60
65
If you are running a hybrid architecture — some agents orchestrated, some operating peer-to-peer — apply A2A profiles to the agents participating in peer-to-peer communication and MAS profiles to sub-agents under centralized orchestration.
Related
Cedar Cookbook — patterns for writing custom A2A policy rules
Guardrail Evaluations — how the evaluation lifecycle works
Agent Identity (ZeroID) — issuing and managing agent identities
Agent Delegation — scoping delegated identity in multi-agent workflows
Last updated