A2A

Agent-to-Agent (A2A) policies enforce trust boundaries in multi-agent systems where agents communicate peer-to-peer rather than through a central orchestrator. They extend the standard Guardrails policy model with identity-aware rules that account for the unique threat surface of inter-agent communication.


Why A2A policies are different

In a traditional orchestrated multi-agent system (MAS), a central orchestrator validates sub-agents and controls their access. A2A systems are different: agents operate independently, each agent self-reports its identity, and agents may receive inputs from other agents they do not fully control.

This creates attack vectors that standard prompt and tool guardrails do not address:

Threat
How it arises in A2A

Identity spoofing

An agent claims a trust level or type it does not have, gaining elevated access

Indirect injection

A malicious payload is embedded in another agent's output and consumed downstream

Confused deputy

An agent is manipulated into making cross-origin requests outside its intended trust domain

Supply chain attacks

Tool descriptions in MCP servers are poisoned to redirect agent behavior

Behavioral drift (rug pull)

An agent behaves legitimately until it has established enough trust, then changes behavior

Session escalation

Low-severity signals accumulate across a session until the risk threshold is exceeded

A2A policies detect and enforce against all of these at Cedar policy evaluation time, before any tool execution or response is returned.


Trust model

Every A2A policy evaluation is grounded in the calling agent's identity, which flows from a ZeroID JWT into the Cedar evaluation context. Three fields determine how restrictive the applicable policies are:

Trust level

Value
Meaning
Typical source

first_party

Agent owned and operated by your organization

Your own agents registered in ZeroID

verified_third_party

Independently verified agent from an audited publisher

Verified partner agents with ZeroID attestation

unverified

No verification — identity is self-asserted only

External agents, community models, unknown callers

Agent type

Value
Meaning

autonomous

Operates without human oversight — highest risk, most restricted

tool_agent

Executes specific tools under bounded scope

human_proxy

Acts on behalf of a human in the session loop

orchestrator

Routes to sub-agents; subject to orchestrator-specific rules

Agent framework

The framework declaration (claude-code, langchain, crewai, etc.) is required for any unverified agent requesting sensitive operations. Missing framework + unverified trust = blocked.

How identity reaches Cedar

ZeroID issues a JWT containing agent identity claims. The Shield SDK and Agent Gateway project these claims into the Cedar evaluation context automatically:

These fields are available in every Cedar policy as context.agent_id, context.agent_trust_level, etc.


Policy profiles

A2A policies are organized into five profiles. Each profile can be applied independently. All five are available in highflame-policy under schemas/guardrails/templates/profiles/a2a_security/.


1. Identity enforcement

Profile: identity_enforcement.cedar

Prevents agents from operating with incomplete or inconsistent identity claims. Incomplete identity is treated as a spoofing indicator — not as a benign configuration gap.

Rule
Condition
Action blocked

Anonymous agent

agent_type present, agent_id empty

Tool calls

Unregistered framework

unverified + no agent_framework

Sensitive tools

Unverified MCP connection

agent_trust_level == "unverified"

connect_server

Autonomous + unverified

Both conditions simultaneously

All tool calls

The most critical rule is the last one: autonomous agents with unverified trust are unconditionally blocked from all tool execution. This combination — no human oversight and no identity attestation — is treated as too high a risk to allow regardless of payload content.


2. Indirect injection detection

Profile: inter_agent_injection.cedar

Detects injection payloads that arrive through another agent's outputs rather than directly from a user. This is the most common A2A-specific attack vector: a downstream agent consumes tool results or RAG retrievals that have been poisoned by an upstream source the receiving agent does not control.

The indirect_injection_score context field is produced by a deep-context detector that analyzes multi-turn patterns across the session, including encoded payloads (base64, hex) that attempt to bypass text-based filters.

Rule
Threshold
Action blocked

General indirect injection

indirect_injection_score >= 60

Tool calls

Sensitive tools with weaker signal

indirect_injection_score >= 40

High-risk tools only

Multi-turn progressive attack

Session GRU model flags escalation

Tool calls + prompts

Encoded payload

Encoding pattern detected

All actions


3. Cross-origin (confused deputy)

Profile: cross_origin.cedar

A confused deputy attack manipulates an agent into proxying requests across trust boundaries — for example, using an agent's credentials to access an internal resource on behalf of an external caller, or forwarding a response to a domain outside the agent's intended scope.

The cross_origin_score context field scores the degree of cross-origin mixing in the request, from mixed HTTP/HTTPS schemes (score: 70) up to direct URL injection in tool parameters (score: 85) or localhost-plus-external mixing (score: 90).

Rule
Threshold
Scope

Critical cross-origin

>= 80

All agents, all actions

Moderate cross-origin for unverified agents

>= 60

unverified agents only

Server connection with cross-origin signal

>= 65

connect_server action

Sensitive tools with cross-origin present

>= 60

High-risk tools


4. Supply chain protection

Profile: supply_chain.cedar

Covers attacks targeting the tool ecosystem that agents rely on, specifically:

Tool poisoning — hidden instructions embedded in tool descriptions, metadata, or system prompts. Because tool descriptions are loaded by the agent (not typed by a user), poisoned descriptions can redirect agent behavior without any visible prompt manipulation. The tool_poisoning_score field is produced by a description-aware detector.

Rug pull (behavioral drift) — an agent or tool behaves correctly until it has accumulated enough trust or context, then pivots to a malicious objective. The rug_pull_detected flag and rug_pull_score capture sudden deviations in behavior pattern.

Credential theft chains — multi-step sequences: read a credential file, encode it, then exfiltrate via a tool call to an external endpoint. The suspicious_pattern and pattern_type context fields identify these sequences.

Rule
Condition
Scope

Tool poisoning

tool_poisoning_score >= 60, non-first-party

Tool calls

Server poisoning (lower threshold)

tool_poisoning_score >= 55, non-first-party

connect_server

Rug pull

rug_pull_score >= 70

Tool calls

Credential theft chain

pattern_type == "credential_theft", non-first-party

Tool calls


5. Session escalation detection

Profile: escalation_detection.cedar

Monitors session-level risk scores rather than per-request scores. This catches attacks that distribute their payload across multiple low-scoring turns to stay below per-request thresholds.

Three session aggregate fields drive these policies:

Field
What it tracks

session_max_injection_score

Peak injection score observed in the session so far

session_cumulative_risk_score

Running total of risk contributions across all turns

session_threat_turns

Number of turns where at least one threat signal fired

The thresholds are deliberately lower than the equivalent MAS thresholds, because A2A sessions lack a central orchestrator that can intervene between turns.

Rule
Threshold
Scope

Session injection peak

session_max_injection_score >= 70, non-first-party

Tool calls + prompts

Session jailbreak peak

session_max_jailbreak_score >= 70, non-first-party

Tool calls + prompts

Cumulative risk circuit breaker

session_cumulative_risk_score > 150

Sensitive tools

Threat turn lockout

session_threat_turns >= 3, unverified

All tool calls


Applying A2A policy profiles

A2A profiles are Cedar policy files. Apply them in the same way as any other guardrail policy — load them into the policy engine alongside your existing rules.

Profiles can also be applied in Highflame StudioShieldPoliciesApply profile. Changes take effect on the next request without a redeploy.

Start with the identity enforcement and supply chain profiles — both have very low false positive rates on legitimate traffic. Enable indirect injection and cross-origin in Monitor mode for one week to review detections before switching to Block. Enable session escalation detection last, after you have calibrated the per-request thresholds.

Profile
Recommended initial mode

Identity enforcement

Block

Supply chain

Block

Cross-origin

Monitor → Block after review

Indirect injection

Monitor → Block after review

Session escalation

Monitor → Block after review


Comparing A2A and MAS policy thresholds

A2A policies use tighter thresholds than the equivalent MAS policies because there is no central orchestrator to catch what slips through:

Signal
A2A threshold
MAS threshold

Session cumulative risk

150

200

Session threat turns (lockout)

3

5

Session injection peak

70

80

Tool poisoning (non-first-party)

60

65

If you are running a hybrid architecture — some agents orchestrated, some operating peer-to-peer — apply A2A profiles to the agents participating in peer-to-peer communication and MAS profiles to sub-agents under centralized orchestration.


Last updated