# Chat Assistant

The `chat_assistant` profile is designed for customer-facing AI products — support bots, consumer chatbots, and any deployment where the user population is untrusted and content safety requirements are strict. It lowers detection thresholds below the defaults and adds bidirectional content controls.

***

## Why public-facing deployments need stricter thresholds

The [default policies](/agent-authorization-and-control-shield/policy-templates.md#default-policies) are calibrated for internal or semi-trusted deployments where the baseline user is a known employee or developer. Public-facing chat applications face a different threat profile:

* **Higher jailbreak and injection attempt rates** — consumer chatbots are frequent targets for prompt injection and jailbreak attempts from the general public
* **Content safety requirements** — outputs that would be acceptable in an internal tool may be inappropriate or harmful in a consumer product
* **Bidirectional PII risk** — users may inadvertently send personal information, and models may generate it in responses

The `chat_assistant` profile lowers injection and jailbreak thresholds and adds output-side controls that the defaults do not include.

***

## Profile files

***

### security.cedar — Tighter injection and jailbreak thresholds

Lowers the confidence threshold required to block prompt injection and jailbreak attempts. The defaults block at approximately 80 confidence; this profile blocks at 70 for injection and 65 for jailbreaks.

| Rule                   | Threshold                    | Default threshold |
| ---------------------- | ---------------------------- | ----------------- |
| Prompt injection block | `injection_confidence >= 70` | \~80              |
| Jailbreak block        | `jailbreak_confidence >= 65` | \~80              |

```cedar
@id("chat-injection-lower-threshold")
@severity("high")
forbid(principal, action == Guardrails::Action::"process_prompt", resource)
when { context has injection_confidence && context.injection_confidence >= 70 };

@id("chat-jailbreak-lower-threshold")
@severity("high")
forbid(principal, action == Guardrails::Action::"process_prompt", resource)
when { context has jailbreak_confidence && context.jailbreak_confidence >= 65 };
```

These lower thresholds will catch more borderline cases at the cost of a higher false positive rate. Review Monitor-mode detections during rollout to confirm signal quality against your user population before switching to Block.

***

### privacy.cedar — Bidirectional PII protection

Extends the default PII policy (inputs only) to also block PII in model outputs. This prevents the model from generating responses that contain personal information — even if the user's input did not contain PII that the model is echoing back.

```cedar
@id("chat-pii-block-input-output")
@severity("high")
forbid(principal, action == Guardrails::Action::"process_prompt", resource)
when { context has pii_detected && context.pii_detected == true };
```

The `direction` context field distinguishes input vs. output evaluations — this rule applies to both.

***

### trust\_safety.cedar — Toxicity and topic restrictions

**Toxicity block:**

Blocks violent, hateful, sexually explicit, and profane content at a toxicity score above 70. The default toxicity policy may be more permissive — this profile enforces a stricter threshold appropriate for consumer-facing deployments.

```cedar
@id("chat-toxicity-strict")
@severity("high")
forbid(principal, action, resource)
when { context has toxicity_score && context.toxicity_score >= 70 };
```

**Topic restrictions:**

Blocks responses where the topic classifier identifies content in restricted categories, when the classifier confidence is above 70:

| Restricted topic      | Examples                                                          |
| --------------------- | ----------------------------------------------------------------- |
| Weapons manufacturing | Instructions for producing firearms, explosives, chemical weapons |
| Illegal activity      | Detailed guidance for committing specific crimes                  |
| Controlled substances | Manufacturing or procurement instructions                         |
| Financial fraud       | Step-by-step guidance for fraud schemes                           |

```cedar
@id("chat-topic-restriction")
@severity("high")
forbid(principal, action == Guardrails::Action::"process_prompt", resource)
when {
    context has topic_confidence && context.topic_confidence >= 70 &&
    context has content_topics &&
    (context.content_topics.contains("weapons_manufacturing") ||
     context.content_topics.contains("illegal_activity") ||
     context.content_topics.contains("controlled_substances") ||
     context.content_topics.contains("financial_fraud"))
};
```

Topic restrictions apply to model outputs — the evaluation runs after the model generates its response. If a response is blocked, the user receives an error and the violation is recorded in Studio.

***

## Applying the profile

{% tabs %}
{% tab title="Python" %}

```python
from highflame.shield import GuardrailsClient

client = GuardrailsClient(api_key="...")

client.policies.load_profile("chat_assistant/*")
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import { GuardrailsClient } from "@highflame/sdk";

const client = new GuardrailsClient({ apiKey: "..." });

await client.policies.loadProfile("chat_assistant/*");
```

{% endtab %}
{% endtabs %}

***

## Customizing topic restrictions

The built-in topic list covers the most common restricted categories. Add your own topics using a custom Cedar rule alongside the profile:

```python
client.policies.load_profile("chat_assistant/*")
client.policies.load_cedar("""
    @id("custom-block-competitor-discussion")
    forbid(principal, action == Guardrails::Action::"process_prompt", resource)
    when {
        context has content_topics &&
        context.content_topics.contains("competitor_product") &&
        context has topic_confidence && context.topic_confidence >= 75
    };
""")
```

***

## Rollout guidance

1. **Deploy in Monitor mode first.** The lower injection and jailbreak thresholds will generate more events than the defaults. Run for 1–2 weeks to establish a detection baseline against your actual user traffic.
2. **Review false positives.** In [Observatory Threats](/observatory/threats.md), filter to `mode = monitor` and `policy_category = security` to see what the lower thresholds are catching. If legitimate queries are flagged, consider raising the thresholds slightly in a custom override.
3. **Enable PII output blocking early.** The bidirectional PII rule has a low false positive rate on most chat applications and can be switched to Block without extended monitoring.
4. **Enable toxicity and topic restrictions in Block last.** These depend on classifier confidence and may need tuning for your specific subject matter.

***

## Related

* [Policy Templates](/agent-authorization-and-control-shield/policy-templates.md) — all available profiles and selection guide
* [Advanced Detection Policies](/agent-authorization-and-control-shield/policy-templates/advanced-detection-policies.md) — add ML-enhanced PII detection for regulated consumer applications
* [Cedar Cookbook](/agent-authorization-and-control-shield/cedar-cookbook.md) — custom topic restrictions and threshold tuning


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/agent-authorization-and-control-shield/policy-templates/chat-assistant-policies.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
