# A2A

Agent-to-Agent (A2A) policies enforce trust boundaries in multi-agent systems where agents communicate peer-to-peer rather than through a central orchestrator. They extend the standard Guardrails policy model with identity-aware rules that account for the unique threat surface of inter-agent communication.

***

## Why A2A policies are different

In a traditional orchestrated multi-agent system (MAS), a central orchestrator validates sub-agents and controls their access. A2A systems are different: agents operate independently, each agent self-reports its identity, and agents may receive inputs from other agents they do not fully control.

This creates attack vectors that standard prompt and tool guardrails do not address:

| Threat                          | How it arises in A2A                                                                        |
| ------------------------------- | ------------------------------------------------------------------------------------------- |
| **Identity spoofing**           | An agent claims a trust level or type it does not have, gaining elevated access             |
| **Indirect injection**          | A malicious payload is embedded in another agent's output and consumed downstream           |
| **Confused deputy**             | An agent is manipulated into making cross-origin requests outside its intended trust domain |
| **Supply chain attacks**        | Tool descriptions in MCP servers are poisoned to redirect agent behavior                    |
| **Behavioral drift (rug pull)** | An agent behaves legitimately until it has established enough trust, then changes behavior  |
| **Session escalation**          | Low-severity signals accumulate across a session until the risk threshold is exceeded       |

A2A policies detect and enforce against all of these at Cedar policy evaluation time, before any tool execution or response is returned.

***

## Trust model

Every A2A policy evaluation is grounded in the calling agent's identity, which flows from a [ZeroID](/agent-identity-zeroid/introduction.md) JWT into the Cedar evaluation context. Three fields determine how restrictive the applicable policies are:

### Trust level

| Value                  | Meaning                                                | Typical source                                     |
| ---------------------- | ------------------------------------------------------ | -------------------------------------------------- |
| `first_party`          | Agent owned and operated by your organization          | Your own agents registered in ZeroID               |
| `verified_third_party` | Independently verified agent from an audited publisher | Verified partner agents with ZeroID attestation    |
| `unverified`           | No verification — identity is self-asserted only       | External agents, community models, unknown callers |

### Agent type

| Value          | Meaning                                                          |
| -------------- | ---------------------------------------------------------------- |
| `autonomous`   | Operates without human oversight — highest risk, most restricted |
| `tool_agent`   | Executes specific tools under bounded scope                      |
| `human_proxy`  | Acts on behalf of a human in the session loop                    |
| `orchestrator` | Routes to sub-agents; subject to orchestrator-specific rules     |

### Agent framework

The framework declaration (`claude-code`, `langchain`, `crewai`, etc.) is required for any `unverified` agent requesting sensitive operations. Missing framework + unverified trust = blocked.

### How identity reaches Cedar

ZeroID issues a JWT containing agent identity claims. The Shield SDK and Agent Gateway project these claims into the Cedar evaluation context automatically:

```json
{
  "agent_id": "agent:acme:code-assistant:v1",
  "agent_type": "autonomous",
  "agent_trust_level": "verified_third_party",
  "agent_framework": "claude-code",
  "agent_publisher": "acme"
}
```

These fields are available in every Cedar policy as `context.agent_id`, `context.agent_trust_level`, etc.

***

## Policy profiles

A2A policies are organized into five profiles. Each profile can be applied independently. All five are available in `highflame-policy` under `schemas/guardrails/templates/profiles/a2a_security/`.

***

### 1. Identity enforcement

**Profile:** `identity_enforcement.cedar`

Prevents agents from operating with incomplete or inconsistent identity claims. Incomplete identity is treated as a spoofing indicator — not as a benign configuration gap.

| Rule                      | Condition                              | Action blocked   |
| ------------------------- | -------------------------------------- | ---------------- |
| Anonymous agent           | `agent_type` present, `agent_id` empty | Tool calls       |
| Unregistered framework    | `unverified` + no `agent_framework`    | Sensitive tools  |
| Unverified MCP connection | `agent_trust_level == "unverified"`    | `connect_server` |
| Autonomous + unverified   | Both conditions simultaneously         | All tool calls   |

The most critical rule is the last one: autonomous agents with unverified trust are unconditionally blocked from all tool execution. This combination — no human oversight and no identity attestation — is treated as too high a risk to allow regardless of payload content.

```cedar
@id("a2a-block-autonomous-unverified")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has agent_type && context.agent_type == "autonomous" &&
    context has agent_trust_level && context.agent_trust_level == "unverified"
};
```

***

### 2. Indirect injection detection

**Profile:** `inter_agent_injection.cedar`

Detects injection payloads that arrive through another agent's outputs rather than directly from a user. This is the most common A2A-specific attack vector: a downstream agent consumes tool results or RAG retrievals that have been poisoned by an upstream source the receiving agent does not control.

The `indirect_injection_score` context field is produced by a deep-context detector that analyzes multi-turn patterns across the session, including encoded payloads (base64, hex) that attempt to bypass text-based filters.

| Rule                               | Threshold                          | Action blocked       |
| ---------------------------------- | ---------------------------------- | -------------------- |
| General indirect injection         | `indirect_injection_score >= 60`   | Tool calls           |
| Sensitive tools with weaker signal | `indirect_injection_score >= 40`   | High-risk tools only |
| Multi-turn progressive attack      | Session GRU model flags escalation | Tool calls + prompts |
| Encoded payload                    | Encoding pattern detected          | All actions          |

```cedar
@id("a2a-indirect-injection-agent")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has indirect_injection_score &&
    context.indirect_injection_score >= 60
};
```

***

### 3. Cross-origin (confused deputy)

**Profile:** `cross_origin.cedar`

A confused deputy attack manipulates an agent into proxying requests across trust boundaries — for example, using an agent's credentials to access an internal resource on behalf of an external caller, or forwarding a response to a domain outside the agent's intended scope.

The `cross_origin_score` context field scores the degree of cross-origin mixing in the request, from mixed HTTP/HTTPS schemes (score: 70) up to direct URL injection in tool parameters (score: 85) or localhost-plus-external mixing (score: 90).

| Rule                                        | Threshold | Scope                    |
| ------------------------------------------- | --------- | ------------------------ |
| Critical cross-origin                       | `>= 80`   | All agents, all actions  |
| Moderate cross-origin for unverified agents | `>= 60`   | `unverified` agents only |
| Server connection with cross-origin signal  | `>= 65`   | `connect_server` action  |
| Sensitive tools with cross-origin present   | `>= 60`   | High-risk tools          |

```cedar
@id("a2a-cross-origin-block-critical")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action in [Guardrails::Action::"process_prompt", Guardrails::Action::"call_tool"],
    resource
)
when {
    context has cross_origin_score && context.cross_origin_score >= 80
};
```

***

### 4. Supply chain protection

**Profile:** `supply_chain.cedar`

Covers attacks targeting the tool ecosystem that agents rely on, specifically:

**Tool poisoning** — hidden instructions embedded in tool descriptions, metadata, or system prompts. Because tool descriptions are loaded by the agent (not typed by a user), poisoned descriptions can redirect agent behavior without any visible prompt manipulation. The `tool_poisoning_score` field is produced by a description-aware detector.

**Rug pull (behavioral drift)** — an agent or tool behaves correctly until it has accumulated enough trust or context, then pivots to a malicious objective. The `rug_pull_detected` flag and `rug_pull_score` capture sudden deviations in behavior pattern.

**Credential theft chains** — multi-step sequences: read a credential file, encode it, then exfiltrate via a tool call to an external endpoint. The `suspicious_pattern` and `pattern_type` context fields identify these sequences.

| Rule                               | Condition                                             | Scope            |
| ---------------------------------- | ----------------------------------------------------- | ---------------- |
| Tool poisoning                     | `tool_poisoning_score >= 60`, non-first-party         | Tool calls       |
| Server poisoning (lower threshold) | `tool_poisoning_score >= 55`, non-first-party         | `connect_server` |
| Rug pull                           | `rug_pull_score >= 70`                                | Tool calls       |
| Credential theft chain             | `pattern_type == "credential_theft"`, non-first-party | Tool calls       |

```cedar
@id("a2a-tool-poisoning-agent")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action == Guardrails::Action::"call_tool",
    resource
)
when {
    context has agent_trust_level && context.agent_trust_level != "first_party" &&
    context has tool_poisoning_score && context.tool_poisoning_score >= 60
};
```

***

### 5. Session escalation detection

**Profile:** `escalation_detection.cedar`

Monitors session-level risk scores rather than per-request scores. This catches attacks that distribute their payload across multiple low-scoring turns to stay below per-request thresholds.

Three session aggregate fields drive these policies:

| Field                           | What it tracks                                         |
| ------------------------------- | ------------------------------------------------------ |
| `session_max_injection_score`   | Peak injection score observed in the session so far    |
| `session_cumulative_risk_score` | Running total of risk contributions across all turns   |
| `session_threat_turns`          | Number of turns where at least one threat signal fired |

The thresholds are deliberately lower than the equivalent MAS thresholds, because A2A sessions lack a central orchestrator that can intervene between turns.

| Rule                            | Threshold                                            | Scope                |
| ------------------------------- | ---------------------------------------------------- | -------------------- |
| Session injection peak          | `session_max_injection_score >= 70`, non-first-party | Tool calls + prompts |
| Session jailbreak peak          | `session_max_jailbreak_score >= 70`, non-first-party | Tool calls + prompts |
| Cumulative risk circuit breaker | `session_cumulative_risk_score > 150`                | Sensitive tools      |
| Threat turn lockout             | `session_threat_turns >= 3`, unverified              | All tool calls       |

```cedar
@id("a2a-session-injection-peak-block")
@severity("critical")
forbid(
    principal is Guardrails::Agent,
    action in [Guardrails::Action::"call_tool", Guardrails::Action::"process_prompt"],
    resource
)
when {
    context has agent_trust_level && context.agent_trust_level != "first_party" &&
    context has session_max_injection_score &&
    context.session_max_injection_score >= 70
};
```

***

## Applying A2A policy profiles

A2A profiles are Cedar policy files. Apply them in the same way as any other guardrail policy — load them into the policy engine alongside your existing rules.

{% tabs %}
{% tab title="Python" %}

```python
from highflame.shield import GuardrailsClient

client = GuardrailsClient(api_key="...")

# Load a specific A2A profile
client.policies.load_profile("a2a_security/identity_enforcement")
client.policies.load_profile("a2a_security/supply_chain")

# Or load all five profiles at once
client.policies.load_profile("a2a_security/*")
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import { GuardrailsClient } from "@highflame/sdk";

const client = new GuardrailsClient({ apiKey: "..." });

// Load a specific A2A profile
await client.policies.loadProfile("a2a_security/identity_enforcement");
await client.policies.loadProfile("a2a_security/supply_chain");

// Or load all five profiles at once
await client.policies.loadProfile("a2a_security/*");
```

{% endtab %}
{% endtabs %}

Profiles can also be applied in **Highflame Studio** → **Shield** → **Policies** → **Apply profile**. Changes take effect on the next request without a redeploy.

### Recommended rollout

Start with the identity enforcement and supply chain profiles — both have very low false positive rates on legitimate traffic. Enable indirect injection and cross-origin in Monitor mode for one week to review detections before switching to Block. Enable session escalation detection last, after you have calibrated the per-request thresholds.

| Profile              | Recommended initial mode     |
| -------------------- | ---------------------------- |
| Identity enforcement | Block                        |
| Supply chain         | Block                        |
| Cross-origin         | Monitor → Block after review |
| Indirect injection   | Monitor → Block after review |
| Session escalation   | Monitor → Block after review |

***

## Comparing A2A and MAS policy thresholds

A2A policies use tighter thresholds than the equivalent MAS policies because there is no central orchestrator to catch what slips through:

| Signal                           | A2A threshold | MAS threshold |
| -------------------------------- | ------------- | ------------- |
| Session cumulative risk          | 150           | 200           |
| Session threat turns (lockout)   | 3             | 5             |
| Session injection peak           | 70            | 80            |
| Tool poisoning (non-first-party) | 60            | 65            |

If you are running a hybrid architecture — some agents orchestrated, some operating peer-to-peer — apply A2A profiles to the agents participating in peer-to-peer communication and MAS profiles to sub-agents under centralized orchestration.

***

## Related

* [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) — patterns for writing custom A2A policy rules
* [Guardrail Evaluations](/agent-authorization-and-control-shield/guardrails-policies/bounded-functional-units.md) — how the evaluation lifecycle works
* [Agent Identity (ZeroID)](/agent-identity-zeroid/introduction.md) — issuing and managing agent identities
* [Agent Delegation](/agent-identity-zeroid/guides/agent-delegation.md) — scoping delegated identity in multi-agent workflows


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-authorization-and-control-shield/policy-templates/a2a-policies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
