Setting Up Policies

This guide explains how Policy works and how enterprise administrators define, manage, and enforce security & governance rules for AI coding assistants and their integrations. After discovering which MCP servers and tools are in use (see Discovery and Metrics), administrators govern that usage through policy.

Why Policy-Driven Guardrails Matter

Policy-driven guardrails give organizations precise, enforceable control over how AI agents behave without relying on static rules or post-hoc review. By expressing intent as policy, enterprises can govern actions, tools, and data access consistently across users and environments.

The ability to test these guardrails interactively in the Playground is critical. Administrators can validate policy behavior against real scenarios, understand how rules interact, and iterate safely before rollout. This reduces misconfigurations, builds confidence in enforcement, and enables teams to move fast without compromising security.

Policy: Define, Manage, and Enforce Rules

The platform protects AI coding assistants such as Cursor, Claude Code, and GitHub Copilot and their connections to LLM models & MCP servers. To secure these interactions in a controlled and scalable way, organizations need to:

  1. Define what is allowed or forbidden, by action type, content, tool, file path, or threat level.

  2. Manage policies in one place, enable or disable by category, edit rules, and deploy changes.

  3. Enforce consistently; every prompt, tool call, and file operation is evaluated against the same policy set before it runs.

Policy is designed to meet these needs. Rules are expressed in Cedar, a policy language built for authorization and used across the Highflame platform. The Overwatch console provides a dedicated Policy page where administrators can view all policy categories, manage rules, and deploy changes ensuring that the same guardrails are enforced consistently by Overwatch agents across the organization.

Policy Categories

Category
What it covers

Secrets Detection

Detect and block credentials, tokens, API keys, and sensitive key patterns in prompts, tool calls, file operations, and AI responses.

PII Detection

Detect and block personally identifiable information (PII) such as credit card numbers, SSNs, and other sensitive personal data.

Semantic Threat Detection

Detect and block prompt injection, jailbreak attempts, and high-severity AI security threats.

Tool Permissions

Control access to shell execution, file operations, MCP servers, and sensitive system paths (e.g. block dangerous commands, restrict file paths).

Organization Rules

Organization-wide baselines: default permit/deny, audit logging, user or team-based permissions, agent-specific guardrails.

Content Safety

Detect and control violent, harmful, hateful, sexual, and profane content using trust-and-safety classification.

Agent Security

Detect and block tool poisoning, rug pull attacks, and indirect prompt injection targeting AI coding agents (agentic AI–specific threats).

Custom

User-defined Cedar policies for specific use cases not covered by the built-in categories.

Playground: Test Policies Safely

The Playground is an interactive environment where administrators can test how policies behave without affecting production traffic. It answers: “If a user asks the agent to do X, will our current policies allow or deny it?”

  • Validate before rollout: Safely test high-risk or sensitive actions such as reading .env files or running destructive commands and confirm they are allowed or blocked as intended before deploying policies organization-wide.

  • Understand policy behavior: See how policies interact by reviewing the active policy set, the Cedar policy preview, and the resulting allow or deny decision for each test scenario.

  • Iterate quickly: Enable or disable policies directly in the Playground, or return to the Policy page to edit and deploy changes. Then re-test in the Playground to validate behavior before enforcement.

Last updated