Code Agent Policies

This guide explains how Policy works and how enterprise administrators define, manage, and enforce security & governance rules for AI coding assistants and their integrations. After discovering which MCP servers and tools are in use (see Discovery and Metrics), administrators govern that usage through policy.

Why Policy-Driven Guardrails Matter

Policy-driven guardrails give organizations precise, enforceable control over how AI agents behave without relying on static rules or post-hoc review. By expressing intent as policy, enterprises can govern actions, tools, and data access consistently across users and environments.

The ability to test these guardrails interactively in the Playground is critical. Administrators can validate policy behavior against real scenarios, understand how rules interact, and iterate safely before rollout. This reduces misconfigurations, builds confidence in enforcement, and enables teams to move fast without compromising security.

Policy: Define, Manage, and Enforce Rules

The platform protects AI coding assistants such as Cursor and Claude Code, and their connections to LLM models & MCP servers. To secure these interactions in a controlled and scalable way, organizations need to:

  1. Define what is allowed or forbidden, by action type, content, tool, file path, or threat level.

  2. Manage policies in one place, enable or disable by category, edit rules, and deploy changes.

  3. Enforce consistently; every prompt, tool call, and file operation is evaluated against the same policy set before it runs.

Policy is designed to meet these needs. Rules are expressed in Cedar, a policy language built for authorization and used across the Highflame platform. The Overwatch console provides a dedicated Policy page where administrators can view all policy categories, manage rules, and deploy changes ensuring that the same guardrails are enforced consistently by Overwatch agents across the organization.

Policy Categories

Category
What it covers

Secrets Detection

Detect and block credentials, tokens, API keys, and sensitive key patterns in prompts, tool calls, file operations, and AI responses.

PII Detection

Detect and block personally identifiable information (PII) such as credit card numbers, SSNs, and other sensitive personal data.

Semantic Threat Detection

Detect and block prompt injection, jailbreak attempts, and high-severity AI security threats.

Tool Permissions

Control access to shell execution, file operations, MCP servers, and sensitive system paths (e.g. block dangerous commands, restrict file paths).

Organization Rules

Organization-wide baselines: default permit/deny, audit logging, user or team-based permissions, agent-specific guardrails.

Content Safety

Detect and control violent, harmful, hateful, sexual, and profane content using trust-and-safety classification.

Agent Security

Detect and block tool poisoning, rug pull attacks, and indirect prompt injection targeting AI coding agents (agentic AI–specific threats).

Custom

User-defined Cedar policies for specific use cases not covered by the built-in categories.

Playground: Test Policies Safely

The Playground is an interactive environment where administrators can test how policies behave without affecting production traffic. It answers: “If a user asks the agent to do X, will our current policies allow or deny it?”

  • Validate before rollout: Safely test high-risk or sensitive actions such as reading .env files or running destructive commands and confirm they are allowed or blocked as intended before deploying policies organization-wide.

  • Understand policy behavior: See how policies interact by reviewing the active policy set, the Cedar policy preview, and the resulting allow or deny decision for each test scenario.

  • Iterate quickly: Enable or disable policies directly in the Playground, or return to the Policy page to edit and deploy changes. Then re-test in the Playground to validate behavior before enforcement.


Policy profile reference

The policy categories in the Overwatch console map to a set of Cedar policy files that run on each agent event. This section documents exactly what each category enforces, the thresholds applied, and which paths and patterns are protected.


Secrets Detection — secrets in file writes

Blocks any file write operation where secrets are detected in the content being written. This prevents an agent from persisting credentials to disk, whether it produced them or was manipulated into writing them.


Tool Permissions — tool risk, loops, exfiltration, budget

Governs which tools are accessible, when loops are detected, which action patterns indicate exfiltration, and when a session has exceeded its token budget.

Rule
Condition
Notes

Dangerous tool block

tool_risk_score > 85 or tool_category == "dangerous"

Unconditional — no trust level override

Shell execution block

tool_category == "shell"

Blocked by default; override with a scoped permit for specific agents

Sensitive tool threshold

tool_risk_score > 70

Covers database tools, external API calls, and high-risk MCP tools

Loop detection

loop_detected == true and loop_count > 5

Allows up to 5 retries before blocking

Exfiltration pattern

pattern_type in ["data_exfiltration", "secret_exfiltration"]

Blocks the step that attempts to transmit

High sequence risk

sequence_risk > 75

Catches multi-step attack trajectories

Token budget overrun

budget_exceeded == true

Terminates the session — no further actions allowed

Shell execution is blocked by default. To permit shell access for a specific agent (e.g., a build automation agent), add a custom Cedar rule in the Custom policy category:


Tool Permissions — filesystem path security

Blocks file reads and writes to paths that contain credentials, secrets, or sensitive system configuration.

Credential and configuration files:

Pattern
Examples

Environment files

.env, .env.local, .env.production, any *.env*

Package manager credentials

.netrc, .npmrc, .pypirc

Container and orchestration

docker-config.json, ~/.kube/config, service account JSON files

Cloud provider credentials

AWS credentials/config, GCP application default credentials, Azure credentials

Credential paths:

Path pattern
What it protects

.ssh/*

SSH private keys and config

.aws/*

AWS credential files

.azure/*

Azure CLI credentials

.config/gcloud/*

GCP CLI credentials

.gnupg/*

GPG keys

*.pem, *.key

TLS/SSL private keys

id_rsa*, id_ed25519*, id_ecdsa*

SSH key files by name

System paths:

Path pattern
Why blocked

/etc/*

System configuration (passwd, sudoers, hosts)

/proc/*

Process information and kernel interfaces

/sys/*

Kernel and device configuration

/root/*

Root user home directory

/var/log/*

System logs (may contain credentials in plaintext)

/var/run/*

Runtime state (sockets, PIDs)

File operations with action type delete, rmdir, unlink, or remove are also blocked to prevent an agent from being manipulated into deleting source code or build artifacts.


Agent Security — encoding attacks

Blocks tool calls and file writes where the content contains invisible Unicode characters — zero-width spaces, bidirectional text control characters, and similar. These characters are used to hide instructions in content that appears clean to a human reviewer but is interpreted by the model.

This is distinct from base64/URL-encoded injection (covered by Semantic Threat Detection) — this specifically targets Unicode-based steganography in source code and file content.


Agent Security — MCP supply chain and indirect injection

Coding agents are particularly exposed to supply chain attacks because they load tool definitions from MCP servers at runtime. A poisoned MCP server can embed instructions in tool descriptions that redirect the agent's behavior before any prompt is evaluated.

Rule
Threshold
What it catches

MCP server poisoning

tool_poisoning_score >= 60

Instructions hidden in tool descriptions or metadata

Indirect injection (general)

indirect_injection_score >= 70

Injection payloads in tool outputs, file content, or retrieved documents

Indirect injection (sensitive tools)

indirect_injection_score >= 50

Lower threshold when the next action would be a sensitive tool

Credential theft chain

pattern_type == "credential_theft"

Read credential → encode → exfiltrate sequence

Destructive sequence

pattern_type == "destructive_sequence"

Enumerate → bulk delete or overwrite sequence

The indirect injection threshold for sensitive tools (50) is lower than the general threshold (70) because the consequences of a successfully injected call to a sensitive tool are more severe.


For enterprise environments with stricter data handling requirements, combine the built-in policy categories with the Advanced Detection add-on via Highflame StudioShieldPoliciesApply profile:

  • advanced_detection — adds ML-based secret type detection (AWS key format, GCP service account JSON, SSH key structure) and bulk PII detection on top of the pattern-based defaults

  • multi_agent or a2a_security — if your coding agents are themselves orchestrated or communicate peer-to-peer across independent trust contexts

See Policy Templates for the full profile catalog.

Last updated