# Abuse Detection and Control

Agentic abuse is distinct from prompt injection or data exfiltration — it targets the agent's execution lifecycle rather than its content. A looping agent, a session consuming unbounded tokens, or a tool call sequence quietly building toward exfiltration can each cause significant damage without triggering a single content-based detector.

Highflame detects and enforces against these patterns through a combination of session-aware signal tracking, behavioral pattern matching, and Cedar policies that evaluate context accumulated across multiple turns — not just the current request.

***

## Abuse scenarios and controls

### Agent loops

**What it is:** An agent repeatedly invokes the same tool without making progress — either stuck in an error-recovery cycle, manipulated by a prompt to repeat an action, or misconfigured such that the model always produces the same tool call.

**Detectors:**

* `loop_detected` (boolean) — fires when the same tool is invoked consecutively above a count threshold
* `loop_count` (integer) — the number of consecutive identical tool invocations observed so far in the session

**Default policy (`agentic_safety.cedar`):**

```cedar
forbid(principal, action == Guardrails::Action::"call_tool", resource)
when { context has loop_detected && context.loop_detected == true };
```

**Code agent profile (`code_agent/agentic_security.cedar`)** adds a count-based gate — the loop fires after 5 consecutive invocations, giving the agent room to legitimately retry a failed tool before the block kicks in:

```cedar
@id("code-block-loops")
@severity("high")
forbid(principal is Guardrails::Agent, action == Guardrails::Action::"call_tool", resource)
when {
    context has loop_detected && context.loop_detected == true &&
    context has loop_count && context.loop_count > 5
};
```

**Tuning:** If your agents legitimately retry failed tool calls (e.g., a flaky external API), raise the `loop_count` threshold in a custom policy scoped to that agent identity. Do not disable loop detection entirely — even well-intentioned retries can spin indefinitely if the upstream is down.

***

### Token budget overruns

**What it is:** A session consumes tokens far beyond what is expected for its task — either due to a model stuck in a reasoning loop, a context window growing uncontrolled across many turns, or an adversarial input designed to inflate token usage as a denial-of-service against your LLM API budget.

**Detector:**

* `budget_exceeded` (boolean) — fires when the session has consumed more tokens than its configured limit

**Default policy (`agentic_safety.cedar`):**

```cedar
forbid(principal, action, resource)
when { context has budget_exceeded && context.budget_exceeded == true };
```

When `budget_exceeded` fires, the action is blocked and the session is effectively terminated — no further tool calls or prompts will be evaluated until a new session begins.

**Setting a budget:** Configure token budgets per agent or per application in **Highflame Studio** → **Shield** → **Policies**. The budget applies to the sum of input and output tokens across all turns in the session. Set separate limits for development agents (low limit, fail fast) and production agents (higher limit, with alerting before the hard block).

**Monitoring:** The [Observatory Sessions view](/observatory/sessions.md) shows `budget_exceeded` as a flag in the session detail panel, and the total tokens in/out for each session. Use this to calibrate budgets — set the hard limit at \~150% of your 95th percentile session token usage.

***

### Suspicious action sequences

**What it is:** Multi-step tool call patterns that, taken individually, appear benign but together form a recognized attack trajectory. The pattern detector analyzes the sequence of actions across the session rather than evaluating each action in isolation.

**Detector:**

* `suspicious_pattern` (boolean) — fires when a known attack sequence is detected
* `pattern_type` (string) — identifies which sequence was matched
* `sequence_risk` (integer, 0–100) — confidence score for the matched pattern

**Supported pattern types:**

| Pattern                | Sequence detected                                                          |
| ---------------------- | -------------------------------------------------------------------------- |
| `credential_theft`     | Read a credential file → encode the content → call a network or write tool |
| `data_exfiltration`    | Access sensitive data → transform or aggregate → call an external endpoint |
| `db_exfiltration`      | Query database → collect rows → transmit to external destination           |
| `destructive_sequence` | Enumerate files or resources → delete or overwrite in bulk                 |

**Default policy (`agentic_safety.cedar`):**

```cedar
forbid(principal, action == Guardrails::Action::"call_tool", resource)
when {
    context has suspicious_pattern && context.suspicious_pattern == true &&
    context has sequence_risk && context.sequence_risk > 75
};
```

**Code agent profile** adds type-specific rules. For credential theft, for example, any non-first-party agent is blocked immediately when the pattern fires regardless of `sequence_risk`:

```cedar
@id("code-block-credential-theft")
@severity("critical")
forbid(principal is Guardrails::Agent, action == Guardrails::Action::"call_tool", resource)
when {
    context has agent_trust_level && context.agent_trust_level != "first_party" &&
    context has suspicious_pattern && context.suspicious_pattern == true &&
    context has pattern_type && context.pattern_type == "credential_theft"
};
```

**Tuning:** Sequence detection is inherently context-sensitive. If legitimate workflows match a pattern (e.g., an ETL agent that reads a config file, transforms it, and posts to an API), add a Cedar `permit` scoped to that agent identity and tool set to carve out the legitimate case before the `forbid` applies.

***

### Cumulative session risk

**What it is:** Individual events in a session each carry a risk contribution. As risk accumulates across turns — even if no single event crosses a block threshold — the session-level circuit breakers engage.

**Session context fields:**

* `session_cumulative_risk_score` (integer) — running total of risk contributions across all turns
* `session_threat_turns` (integer) — number of turns in which at least one threat signal fired
* `session_max_injection_score` (integer) — peak injection score seen in any single turn

These fields are evaluated on every request in the session, so policies can respond to the session's history — not just the current message.

**MAS profile thresholds (`multi_agent/agent_safety.cedar`):**

| Threshold                                                           | Effect                                                 |
| ------------------------------------------------------------------- | ------------------------------------------------------ |
| `session_cumulative_risk_score > 200`                               | Non-first-party agents restricted from sensitive tools |
| `session_cumulative_risk_score > 500` or `session_threat_turns > 5` | Unverified agents fully locked out from all tool calls |

**A2A profile thresholds (`a2a_security/escalation_detection.cedar`)** are tighter — the circuit breakers trip at 150 and 3 threat turns respectively, because A2A sessions lack an orchestrator to catch escalating risk between turns.

**Writing a custom circuit breaker:**

```cedar
@id("custom-session-lockdown-after-three-threats")
@severity("high")
forbid(principal, action == Guardrails::Action::"call_tool", resource)
when {
    context has session_threat_turns && context.session_threat_turns >= 3 &&
    context has tool_is_sensitive && context.tool_is_sensitive == true
};
```

See the [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) for more session circuit breaker patterns.

***

### Behavioral drift (rug pull)

**What it is:** An agent or tool behaves correctly for an initial period to establish trust, then pivots to a different objective — exfiltrating data, executing destructive operations, or ignoring its original instructions. This is the "rug pull" pattern: the behavior change happens after trust has been established, making it harder to catch with per-request policies.

**Detectors:**

* `rug_pull_detected` (boolean) — fires when behavioral drift is detected relative to the agent's established pattern in the session
* `rug_pull_score` (integer, 0–100) — confidence score for the drift signal; higher = more sudden and significant deviation

The rug pull detector tracks the sequence of tool calls, action types, and output patterns across the session and flags when the pattern shifts abruptly after 3 or more normal calls.

**A2A profile policy (`a2a_security/supply_chain.cedar`):**

```cedar
@id("a2a-rug-pull-agent")
@severity("critical")
forbid(principal is Guardrails::Agent, action == Guardrails::Action::"call_tool", resource)
when {
    context has rug_pull_score && context.rug_pull_score >= 70
};
```

**MAS and code agent deployments** should apply similar rules. The rug pull detector is also used in the code agent supply chain profile to catch MCP server tools that change behavior mid-session.

***

### Tool risk gating

**What it is:** Some tools carry inherent risk regardless of context — shell execution, bulk file deletion, external HTTP calls. Tool risk gating ensures that high-risk tools are accessible only to agents and sessions that meet a minimum trust and risk threshold.

**Context fields:**

* `tool_risk_score` (integer, 0–100) — risk level assigned to the tool
* `tool_category` (string) — categorical classification: `dangerous`, `sensitive`, `shell`, `file_system`, `external_api`, `mcp`, `database`
* `tool_is_sensitive` (boolean) — convenience flag for `tool_risk_score > 60`

**Default policy gates:**

| Gate                     | Condition                                               | Effect                                                    |
| ------------------------ | ------------------------------------------------------- | --------------------------------------------------------- |
| Dangerous tool block     | `tool_category == "dangerous"`                          | Blocked for all non-first-party agents                    |
| Shell execution block    | `tool_category == "shell"`                              | Blocked in code agent profile unless explicitly permitted |
| Sensitive tool threshold | `tool_risk_score > 70`                                  | Blocked in code agent profile                             |
| Autonomous agent cap     | `agent_type == "autonomous"` and `tool_risk_score > 70` | Blocked in MAS profile regardless of trust level          |

**Custom per-agent tool allowlist:**

```cedar
// Permit a specific first-party agent to use shell tools
permit(
    principal == Guardrails::Agent::"agent:acme:deploy-bot:v1",
    action == Guardrails::Action::"call_tool",
    resource
)
when { context has tool_category && context.tool_category == "shell" };

// Deny all other agents
forbid(principal is Guardrails::Agent, action == Guardrails::Action::"call_tool", resource)
when { context has tool_category && context.tool_category == "shell" };
```

Cedar evaluates the most specific applicable rule — the `permit` for the deploy bot takes precedence over the broad `forbid`. See the [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) for more allowlist patterns.

***

## Monitoring abuse in Observatory

All of the above signals are visible in [Observatory](/observatory/observatory.md):

**Threats view:** Filter by **Policy category = agentic\_security** to see all abuse-related detections across your fleet. Use the **Event type** filter to isolate tool call events specifically.

**Sessions view:** The session detail panel shows `loop_detected`, `budget_exceeded`, `cumulative_risk`, and `turn_count` for every session. Sort the session list by violations descending to surface the most active sessions first.

**Command Center:** The detector drift heatmap tracks firing rates for `loop_detected`, `budget_exceeded`, and `suspicious_pattern` over time. A sudden spike in any of these signals across many sessions indicates a systemic issue (model regression, a poisoned tool, or an active attacker campaign) rather than an isolated event.

***

## Recommended controls by deployment type

| Scenario                      | Recommended controls                                                                                                                        |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| **Customer-facing chatbot**   | Token budget per session; `chat_assistant` profile; loop detection enabled                                                                  |
| **Autonomous code agent**     | `code_agent` profile; path security; sequence detection for credential theft and destructive ops; tool risk cap at 70 for autonomous agents |
| **RAG / data pipeline**       | `data_pipeline` profile; sequence detection for data/DB exfiltration; PII zero-tolerance                                                    |
| **Multi-agent orchestration** | `multi_agent` profile; session circuit breakers at cumulative risk 200/500; post-detection lockdowns                                        |
| **Peer-to-peer agents**       | `a2a_security` profile; escalation detection with tighter thresholds (150 cumulative, 3 threat turns)                                       |
| **High-security / regulated** | All of the above + `advanced_detection` profile; alert on every agentic\_security event                                                     |

***

## Related

* [Policy Templates](/agent-authorization-and-control-shield/policy-templates.md) — which profiles include abuse detection rules
* [A2A Policies](/agent-authorization-and-control-shield/policy-templates/a2a-policies.md) — session escalation detection in peer-to-peer agent deployments
* [Multi-Agent Policies](/agent-authorization-and-control-shield/policy-templates/multi-agent-policies.md) — cross-turn session circuit breakers in orchestrated systems
* [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) — writing custom thresholds and allowlists
* [Observatory Sessions](/observatory/sessions.md) — monitoring session-level agentic metrics


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-authorization-and-control-shield/policy-templates/abuse-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
