# Threat Response

This guide walks enterprise administrators through responding to threat events detected by Overwatch across AI coding assistants and their MCP server integrations. It covers how threats appear in the console, how to triage them, what actions to take by severity, and how to tune policies after a false positive.

If you have not yet configured policies, start with [Setting Up Policies](/code-agents/setting-up-policies.md). For understanding which MCP servers are in use and who is using them, see [Discovery & Metrics](/code-agents/discovery-and-metrics.md).

***

## Overview: What Threat Events Look Like

When Overwatch detects a policy violation or security signal, it records a structured threat event. These events are visible in real time in the Highflame Studio console under **Threats & Violations**. Each event represents a single decision point — one prompt evaluation, tool call, file operation, or model response — where one or more detectors fired and a Cedar policy matched.

Threat events aggregate across all users and coding assistants in your organization. They are not session-level summaries; they are individual, discrete records of what happened, what was detected, and what the system decided (allowed or denied). This makes them suitable for both real-time response and post-incident audit.

Code Agent threat events also appear in the [Observatory Threats view](/observatory/threats.md), where they can be correlated with browser violations, gateway blocks, and Shield guardrail events from the same user or session.

***

## Threat Categories and Severity Levels

Overwatch organizes threats into categories that correspond to the policy categories defined in your policy configuration.

| Category                 | Description                                                                             | Typical Severity    |
| ------------------------ | --------------------------------------------------------------------------------------- | ------------------- |
| **Prompt Injection**     | Direct or indirect injection attempts, jailbreaks, multi-turn trajectory drift          | High / Critical     |
| **Secrets Detection**    | API keys, tokens, credentials, or other secrets in prompts, tool calls, or responses    | High                |
| **PII Detection**        | Personally identifiable information (credit cards, SSNs, names+contact data) in content | Medium / High       |
| **Dangerous Tool Usage** | Shell execution, destructive file operations, mass-delete commands, external writes     | High / Critical     |
| **Agent Security**       | Tool poisoning, rug pull attacks, indirect injection targeting coding agents            | Critical            |
| **Content Safety**       | Violent, hateful, harmful, or sexually explicit content                                 | Medium / High       |
| **Policy Violation**     | Action denied by an organization rule, scope restriction, or custom Cedar policy        | Low / Medium / High |

Severity levels:

* **Low** — policy matched but risk is minimal; no immediate action required. Examples: a tool call slightly exceeding a risk threshold, a keyword match on a borderline term.
* **Medium** — policy matched with moderate risk; warrants review and possible policy tuning. Examples: PII detected in a file read, repeated borderline injection scores.
* **High** — clear policy violation with meaningful risk; warrants blocking or notifying the user. Examples: a secret in a prompt, a high-confidence injection attempt, a dangerous tool executed outside an approved context.
* **Critical** — severe risk requiring immediate response. Examples: tool poisoning attack, agent executing destructive shell commands with exfiltrated credentials, multi-step rug pull attempt.

***

## Step-by-Step Triage Workflow

### Finding Threats in the Console

1. Open **Highflame Studio** and navigate to your organization's Overwatch console.
2. In the left navigation, select **Threats & Violations**.
3. The page shows a summary row at the top with four counts: **Total**, **Allowed**, **Denied**, and **Threats**. Use these to orient quickly — a large gap between Total and Denied means most threats are being logged but not blocked (check your enforcement mode).
4. The event list below the summary shows individual threat records in reverse chronological order.
5. Use the available filters to narrow the list:
   * **Severity** — filter to High or Critical to triage the most urgent events first.
   * **Category** — isolate a specific threat type (e.g., Agent Security only).
   * **Code Agent** — filter to a specific IDE or assistant (Cursor and Claude Code).
   * **MCP Server** — filter to events involving a specific server.
   * **User** — filter to a specific developer.
   * **Time range** — focus on a specific window.

### Reading a Threat Event

Click any event row to open the detail view. The key fields:

| Field                | What it tells you                                                                                                                                                                                                            |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Timestamp**        | When the event was recorded (UTC).                                                                                                                                                                                           |
| **User**             | The developer whose agent session triggered the event.                                                                                                                                                                       |
| **Code Agent**       | Which AI coding assistant was in use (Cursor, Claude Code, etc.).                                                                                                                                                            |
| **MCP Server**       | Which MCP server the tool call was directed to, if applicable.                                                                                                                                                               |
| **Action**           | The runtime action: `process_prompt`, `execute_tool`, `read_file`, `write_file`, etc.                                                                                                                                        |
| **Decision**         | `allowed` or `denied` — what Overwatch actually did.                                                                                                                                                                         |
| **Severity**         | Low, Medium, High, or Critical.                                                                                                                                                                                              |
| **Detected Threats** | Structured list of what fired: injection score, secret presence, PII flag, tool risk, and so on. These are the projected context values that Cedar evaluated.                                                                |
| **Matched Policies** | Which Cedar policies contributed to the decision.                                                                                                                                                                            |
| **Payload Preview**  | A truncated view of the content that was evaluated (secrets and PII may be masked).                                                                                                                                          |
| **Session ID**       | The agent session this event belongs to. Click to view all events in the session, or open it in the [Observatory Sessions view](/observatory/sessions.md) for a cross-product timeline including gateway and browser events. |

### Determining True Positive vs. False Positive

Before taking a response action, determine whether the threat is genuine.

**Signs it is a true positive:**

* The `injection_score` is 80 or above and the payload contains clear instruction-override language.
* `contains_secrets` is true and the payload preview shows a recognizable credential format.
* The tool risk is `high` and the action is a destructive shell command or external write outside any approved context.
* The matched policies are from the Agent Security category (tool poisoning or rug pull indicators).
* Multiple detectors fired on the same event (correlated signals increase confidence).

**Signs it may be a false positive:**

* The injection score is in the 40–65 range and the payload is a rephrased technical question.
* PII was detected but the content is a test fixture or synthetic data with fake personal information.
* A tool is flagged as `high` risk but it is a well-understood internal tool operating in an approved environment.
* The matched policy has a threshold that has not yet been calibrated to your organization's traffic patterns.
* The same user has many similar events all flagged at the same low-medium severity with no escalation pattern.

If you are unsure, use the Playground to replay the payload against your current policy set and examine the projected context. See [Tuning After a False Positive](#how-to-tune-policies-after-a-false-positive) below.

***

## Response Actions by Severity

### Low and Medium Threats

For low and medium severity events where no immediate harm has occurred:

1. **Review the event detail.** Confirm whether the detection was accurate. Check the payload preview and matched policies.
2. **Check for patterns.** Navigate to the session view to see whether this is an isolated event or part of a sequence. Multiple low-severity events in a session can indicate escalation.
3. **Tune the policy if needed.** If the event is a false positive, adjust the threshold or add an exception. Use the Playground to validate before deploying. See the [tuning section](#how-to-tune-policies-after-a-false-positive) below.
4. **Add to monitoring if borderline.** If the signal is genuine but below your enforcement threshold, consider lowering the threshold incrementally in `monitor` mode to observe the rate before enforcing.
5. **No user notification required** for most low/medium events unless a pattern is emerging.

### High Severity Threats

For high severity events where a clear policy violation has occurred:

1. **Confirm the decision.** Check whether the event was `denied` or `allowed`. If `allowed`, your policies may be in `monitor` mode or the threshold was not reached. Consider whether the policy should be tightened.
2. **Block the specific tool or server if the risk is targeted.** If the threat involves a specific MCP server or tool, disable that server from the console while you investigate. See [How to Disable a Specific MCP Server](#how-to-disable-a-specific-mcp-server-from-the-console) below.
3. **Notify the affected user.** Reach out to the developer whose session triggered the event. Explain what was detected and what action was taken. In most cases this is informational — the developer may not have been aware of the injection in a retrieved document or MCP tool description.
4. **Review the session.** Click the Session ID to view the full event history for that session. For a deeper view including distributed traces and cross-product events, open the session in [Observatory Sessions](/observatory/sessions.md). Check whether other tools were called before or after the threat event that could indicate a broader compromise.
5. **Update policy thresholds.** If the event was a true positive that slipped through at a lower threshold, tighten the threshold and redeploy in `enforce` mode.

### Critical Threats

Critical threats require immediate action and coordination.

1. **Revoke the agent session immediately.** From the Sessions view, locate the session associated with the event and revoke it. This terminates the active agent connection and prevents further tool execution in that session.
2. **Disable the MCP server involved.** If the threat originated from or was directed at a specific MCP server, disable it from the Discovery page. See [How to Disable a Specific MCP Server](#how-to-disable-a-specific-mcp-server-from-the-console) below.
3. **Notify the affected user and their team.** Inform the developer and, depending on what was accessed or executed, their team or manager.
4. **Preserve the event record.** Do not delete or modify the threat event. It is the primary artifact for post-incident review.
5. **Escalate to your security team.** Follow your organization's incident response process. See the [Escalation Path](#escalation-path-for-critical-incidents) section below.
6. **Conduct a session replay.** Review all events in the session to reconstruct what the agent did, what data it accessed, and whether any exfiltration or destructive action completed before the block.
7. **Review for lateral exposure.** Check whether the same MCP server is connected to other developers' agents. If the threat was a tool poisoning attack embedded in the MCP server's tool descriptions, all users of that server are potentially affected.

***

## How to Disable a Specific MCP Server from the Console

When a threat is linked to a specific MCP server, you can disable that server for your organization directly from the Overwatch console.

1. Navigate to **Discovery** in the left navigation.
2. Locate the MCP server in the **MCP Servers Overview** table. You can search by server name or filter by the code agent.
3. Click the server row to open the server detail panel.
4. In the server detail panel, find the **Status** control and toggle it to **Disabled**.
5. Confirm the action when prompted. The server will be blocked for all new connections across the organization immediately. Existing sessions may continue until revoked.
6. Optionally, navigate to **Sessions** and revoke any active sessions that were connected to this server.

To re-enable the server after investigation is complete, return to the server detail panel and toggle Status back to **Enabled**. Document the reason for re-enabling in your incident record.

***

## How to Tune Policies After a False Positive

A false positive is a legitimate request that was incorrectly denied. Tuning policies reduces false positives while preserving protection against genuine threats.

**Step 1: Identify the matching policy.**

Open the false-positive event detail in Threats & Violations and note the **Matched Policies** field. This tells you which Cedar rule caused the deny.

**Step 2: Understand the projected context.**

Note the **Detected Threats** field. These are the context key values that Cedar evaluated. For example: `injection_score: 72`, `tool_risk: high`, `content_type: tool_call`.

**Step 3: Open the Playground.**

Navigate to **Policy** in the left navigation, then open the **Playground**. The Playground lets you test policy changes against specific payloads interactively.

**Step 4: Paste or type a representative payload.**

Enter a payload that resembles the false-positive request. Run it against your current policy set to confirm it reproduces the deny decision.

**Step 5: Adjust the policy.**

Options for reducing false positives without removing protection:

* **Raise the threshold.** If `injection_score >= 60` is matching legitimate rephrased instructions, raise it to `>= 75` and observe the impact.
* **Narrow the scope.** Add a `content_type` or `action` condition to limit the policy to the cases you actually want to block. For example, apply injection blocking only to `content_type == "prompt"` if false positives are occurring on `tool_call` content.
* **Add an exception.** Use a Cedar `permit` with a specific resource or principal condition to carve out the legitimate case while keeping the broader forbid in place.

**Step 6: Validate in the Playground.**

After making the change, re-run the false-positive payload. Confirm it now resolves to `permit`. Also test a known-malicious payload to confirm the policy still blocks it.

**Step 7: Deploy in `monitor` mode first.**

Apply the updated policy with the application still in `monitor` mode. Monitor the Threats & Violations feed for one to two days to confirm the false-positive rate has decreased without increasing the miss rate.

**Step 8: Switch to `enforce`.**

Once you are satisfied with the tuned behavior, switch the application to `enforce` mode. Monitor closely for the first few hours after the mode change.

***

## How to Set Up Alerts So Threats Reach Your Team

Threat events in the console are useful for triage, but for real-time response your team needs to receive alerts in the channels they already monitor.

Highflame supports two primary alert destinations:

* **Slack** — send threat alerts to a channel via an incoming webhook.
* **Splunk** — forward structured threat events to your SIEM via the HTTP Event Collector.

For setup instructions, see [Alerts](/integrations/alerts.md).

**Scoping alerts to Code Agent threats specifically:**

After configuring your integration, use the `trigger_condition` field to restrict alerting to the threat types and severity levels relevant to your code agent security program. For example, to receive alerts only for agent-security-specific threats:

```json
{
  "trigger_condition": {
    "threats": [
      "prompt_injection_detected",
      "tool_poisoning_detected",
      "secret_detected",
      "dangerous_tool_execution"
    ]
  }
}
```

This prevents high-volume low-severity events from flooding your alert channel while ensuring critical threats reach your team immediately. Refer to the Shield API reference for the full list of supported threat type identifiers.

***

## Escalation Path for Critical Incidents

When a critical threat event requires escalation beyond the standard triage workflow:

1. **Immediate containment (0–15 minutes):**
   * Revoke the affected agent session.
   * Disable the implicated MCP server if one is involved.
   * Confirm whether any destructive or exfiltration action completed before the block.
2. **Initial notification (15–60 minutes):**
   * Notify the affected developer and their manager.
   * Page or message your security team via your standard incident channel.
   * Create an incident record in your ticketing system with the threat event ID, session ID, and a summary of what was detected.
3. **Investigation (1–4 hours):**
   * Replay the full session in the console to reconstruct the sequence of actions. Use [Observatory Sessions](/observatory/sessions.md) to see a unified timeline that includes gateway events and browser activity from the same user.
   * Use [Observatory Traces](/observatory/traces.md) to view the distributed trace for the agent run, including LLM calls, tool invocations, and latency breakdown.
   * Determine whether the threat was sourced from an external MCP server (tool poisoning) or originated in user input (direct injection).
   * Check whether other developers use the same MCP server or have had similar events.
   * Export the event record from Threats & Violations for your incident file.
4. **Remediation:**
   * For tool poisoning: work with the MCP server owner to identify and remove the poisoned tool description. Keep the server disabled until it is remediated and re-scanned.
   * For credential exposure: rotate any secrets that were present in the detected payload immediately, regardless of whether exfiltration is confirmed.
   * For destructive execution: assess blast radius and restore from backup as appropriate.
5. **Post-incident:**
   * Update Cedar policies to prevent recurrence.
   * Lower thresholds or add targeted rules for the detected pattern.
   * Document findings and distribute a brief post-mortem to the engineering team.
   * Re-enable blocked servers only after verification that the threat has been remediated.

***

## What's Next

* [Setting Up Policies](/code-agents/setting-up-policies.md) — define and manage Cedar policies for code agents
* [Discovery & Metrics](/code-agents/discovery-and-metrics.md) — understand which MCP servers and tools are in use
* [Quick Start](/code-agents/quick-start.md) — initial setup for Code Agent Security
* [Alerts](/integrations/alerts.md) — configure Slack and Splunk alert integrations
* [Cedar Policy Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) — practical Cedar patterns for tuning guardrails


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/code-agents/threat-response.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
