# Code Agent Policies

This guide explains how Policy works and how enterprise administrators define, manage, and enforce security & governance rules for AI coding assistants and their integrations. After discovering which MCP servers and tools are in use (see [Discovery and Metrics](/code-agents/discovery-and-metrics.md)), administrators govern that usage through policy.

### Why Policy-Driven Guardrails Matter

Policy-driven guardrails give organizations precise, enforceable control over how AI agents behave without relying on static rules or post-hoc review. By expressing intent as policy, enterprises can govern actions, tools, and data access consistently across users and environments.

The ability to test these guardrails interactively in the Playground is critical. Administrators can validate policy behavior against real scenarios, understand how rules interact, and iterate safely before rollout. This reduces misconfigurations, builds confidence in enforcement, and enables teams to move fast without compromising security.

### Policy: Define, Manage, and Enforce Rules

The platform protects AI coding assistants such as Cursor and Claude Code, and their connections to LLM models & MCP servers. To secure these interactions in a controlled and scalable way, organizations need to:

1. **Define** what is allowed or forbidden, by action type, content, tool, file path, or threat level.
2. **Manage** policies in one place, enable or disable by category, edit rules, and deploy changes.
3. **Enforce** consistently; every prompt, tool call, and file operation is evaluated against the same policy set before it runs.

Policy is designed to meet these needs. Rules are expressed in Cedar, a policy language built for authorization and used across the Highflame platform. The Overwatch console provides a dedicated Policy page where administrators can view all policy categories, manage rules, and deploy changes ensuring that the same guardrails are enforced consistently by Overwatch agents across the organization.

**Policy Categories**

<table><thead><tr><th width="183.0859375">Category</th><th>What it covers</th></tr></thead><tbody><tr><td><strong>Secrets Detection</strong></td><td>Detect and block credentials, tokens, API keys, and sensitive key patterns in prompts, tool calls, file operations, and AI responses.</td></tr><tr><td><strong>PII Detection</strong></td><td>Detect and block personally identifiable information (PII) such as credit card numbers, SSNs, and other sensitive personal data.</td></tr><tr><td><strong>Semantic Threat Detection</strong></td><td>Detect and block prompt injection, jailbreak attempts, and high-severity AI security threats.</td></tr><tr><td><strong>Tool Permissions</strong></td><td>Control access to shell execution, file operations, MCP servers, and sensitive system paths (e.g. block dangerous commands, restrict file paths).</td></tr><tr><td><strong>Organization Rules</strong></td><td>Organization-wide baselines: default permit/deny, audit logging, user or team-based permissions, agent-specific guardrails.</td></tr><tr><td><strong>Content Safety</strong></td><td>Detect and control violent, harmful, hateful, sexual, and profane content using trust-and-safety classification.</td></tr><tr><td><strong>Agent Security</strong></td><td>Detect and block tool poisoning, rug pull attacks, and indirect prompt injection targeting AI coding agents (agentic AI–specific threats).</td></tr><tr><td><strong>Custom</strong></td><td>User-defined Cedar policies for specific use cases not covered by the built-in categories.</td></tr></tbody></table>

<figure><img src="/files/vd1cq7xmIzpUmjbu673X" alt=""><figcaption></figcaption></figure>

#### Playground: Test Policies Safely <a href="#playground-test-policies-safely" id="playground-test-policies-safely"></a>

The **Playground** is an **interactive environment** where administrators can test how policies behave without affecting production traffic. It answers: *“If a user asks the agent to do X, will our current policies allow or deny it?”*

* **Validate before rollout:**\
  Safely test high-risk or sensitive actions such as reading `.env` files or running destructive commands and confirm they are allowed or blocked as intended before deploying policies organization-wide.
* **Understand policy behavior:**\
  See how policies interact by reviewing the active policy set, the Cedar policy preview, and the resulting allow or deny decision for each test scenario.
* **Iterate quickly:**\
  Enable or disable policies directly in the Playground, or return to the Policy page to edit and deploy changes. Then re-test in the Playground to validate behavior before enforcement.

<figure><img src="/files/6jsLYfTfnRTA1JytK0UM" alt=""><figcaption></figcaption></figure>

***

## Policy profile reference

The policy categories in the Overwatch console map to a set of Cedar policy files that run on each agent event. This section documents exactly what each category enforces, the thresholds applied, and which paths and patterns are protected.

***

### Secrets Detection — secrets in file writes

Blocks any file write operation where secrets are detected in the content being written. This prevents an agent from persisting credentials to disk, whether it produced them or was manipulated into writing them.

```cedar
@id("code-block-write-secrets")
@severity("high")
forbid(principal, action == Guardrails::Action::"write_file", resource)
when { context has contains_secrets && context.contains_secrets == true };
```

***

### Tool Permissions — tool risk, loops, exfiltration, budget

Governs which tools are accessible, when loops are detected, which action patterns indicate exfiltration, and when a session has exceeded its token budget.

| Rule                     | Condition                                                      | Notes                                                                   |
| ------------------------ | -------------------------------------------------------------- | ----------------------------------------------------------------------- |
| Dangerous tool block     | `tool_risk_score > 85` or `tool_category == "dangerous"`       | Unconditional — no trust level override                                 |
| Shell execution block    | `tool_category == "shell"`                                     | Blocked by default; override with a scoped `permit` for specific agents |
| Sensitive tool threshold | `tool_risk_score > 70`                                         | Covers database tools, external API calls, and high-risk MCP tools      |
| Loop detection           | `loop_detected == true` and `loop_count > 5`                   | Allows up to 5 retries before blocking                                  |
| Exfiltration pattern     | `pattern_type in ["data_exfiltration", "secret_exfiltration"]` | Blocks the step that attempts to transmit                               |
| High sequence risk       | `sequence_risk > 75`                                           | Catches multi-step attack trajectories                                  |
| Token budget overrun     | `budget_exceeded == true`                                      | Terminates the session — no further actions allowed                     |

Shell execution is blocked by default. To permit shell access for a specific agent (e.g., a build automation agent), add a custom Cedar rule in the **Custom** policy category:

```cedar
permit(
    principal == Guardrails::Agent::"agent:acme:build-bot:v1",
    action == Guardrails::Action::"call_tool",
    resource
)
when { context has tool_category && context.tool_category == "shell" };
```

***

### Tool Permissions — filesystem path security

Blocks file reads and writes to paths that contain credentials, secrets, or sensitive system configuration.

**Credential and configuration files:**

| Pattern                     | Examples                                                                       |
| --------------------------- | ------------------------------------------------------------------------------ |
| Environment files           | `.env`, `.env.local`, `.env.production`, any `*.env*`                          |
| Package manager credentials | `.netrc`, `.npmrc`, `.pypirc`                                                  |
| Container and orchestration | `docker-config.json`, `~/.kube/config`, service account JSON files             |
| Cloud provider credentials  | AWS credentials/config, GCP application default credentials, Azure credentials |

**Credential paths:**

| Path pattern                          | What it protects            |
| ------------------------------------- | --------------------------- |
| `.ssh/*`                              | SSH private keys and config |
| `.aws/*`                              | AWS credential files        |
| `.azure/*`                            | Azure CLI credentials       |
| `.config/gcloud/*`                    | GCP CLI credentials         |
| `.gnupg/*`                            | GPG keys                    |
| `*.pem`, `*.key`                      | TLS/SSL private keys        |
| `id_rsa*`, `id_ed25519*`, `id_ecdsa*` | SSH key files by name       |

**System paths:**

| Path pattern | Why blocked                                        |
| ------------ | -------------------------------------------------- |
| `/etc/*`     | System configuration (passwd, sudoers, hosts)      |
| `/proc/*`    | Process information and kernel interfaces          |
| `/sys/*`     | Kernel and device configuration                    |
| `/root/*`    | Root user home directory                           |
| `/var/log/*` | System logs (may contain credentials in plaintext) |
| `/var/run/*` | Runtime state (sockets, PIDs)                      |

File operations with action type `delete`, `rmdir`, `unlink`, or `remove` are also blocked to prevent an agent from being manipulated into deleting source code or build artifacts.

***

### Agent Security — encoding attacks

Blocks tool calls and file writes where the content contains invisible Unicode characters — zero-width spaces, bidirectional text control characters, and similar. These characters are used to hide instructions in content that appears clean to a human reviewer but is interpreted by the model.

This is distinct from base64/URL-encoded injection (covered by Semantic Threat Detection) — this specifically targets Unicode-based steganography in source code and file content.

***

### Agent Security — MCP supply chain and indirect injection

Coding agents are particularly exposed to supply chain attacks because they load tool definitions from MCP servers at runtime. A poisoned MCP server can embed instructions in tool descriptions that redirect the agent's behavior before any prompt is evaluated.

| Rule                                 | Threshold                                | What it catches                                                          |
| ------------------------------------ | ---------------------------------------- | ------------------------------------------------------------------------ |
| MCP server poisoning                 | `tool_poisoning_score >= 60`             | Instructions hidden in tool descriptions or metadata                     |
| Indirect injection (general)         | `indirect_injection_score >= 70`         | Injection payloads in tool outputs, file content, or retrieved documents |
| Indirect injection (sensitive tools) | `indirect_injection_score >= 50`         | Lower threshold when the next action would be a sensitive tool           |
| Credential theft chain               | `pattern_type == "credential_theft"`     | Read credential → encode → exfiltrate sequence                           |
| Destructive sequence                 | `pattern_type == "destructive_sequence"` | Enumerate → bulk delete or overwrite sequence                            |

The indirect injection threshold for sensitive tools (50) is lower than the general threshold (70) because the consequences of a successfully injected call to a sensitive tool are more severe.

***

## Recommended additions

For enterprise environments with stricter data handling requirements, combine the built-in policy categories with the **Advanced Detection** add-on via **Highflame Studio** → **Shield** → **Policies** → **Apply profile**:

* **`advanced_detection`** — adds ML-based secret type detection (AWS key format, GCP service account JSON, SSH key structure) and bulk PII detection on top of the pattern-based defaults
* **`multi_agent` or `a2a_security`** — if your coding agents are themselves orchestrated or communicate peer-to-peer across independent trust contexts

See [Policy Templates](/agent-authorization-and-control-shield/policy-templates.md) for the full profile catalog.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/code-agents/setting-up-policies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Category	What it covers
Secrets Detection	Detect and block credentials, tokens, API keys, and sensitive key patterns in prompts, tool calls, file operations, and AI responses.
PII Detection	Detect and block personally identifiable information (PII) such as credit card numbers, SSNs, and other sensitive personal data.
Semantic Threat Detection	Detect and block prompt injection, jailbreak attempts, and high-severity AI security threats.
Tool Permissions	Control access to shell execution, file operations, MCP servers, and sensitive system paths (e.g. block dangerous commands, restrict file paths).
Organization Rules	Organization-wide baselines: default permit/deny, audit logging, user or team-based permissions, agent-specific guardrails.
Content Safety	Detect and control violent, harmful, hateful, sexual, and profane content using trust-and-safety classification.
Agent Security	Detect and block tool poisoning, rug pull attacks, and indirect prompt injection targeting AI coding agents (agentic AI–specific threats).
Custom	User-defined Cedar policies for specific use cases not covered by the built-in categories.