# Rate Limits & Quotas

Highflame enforces per-account request quotas to ensure service stability. When a quota is exceeded, the API returns a `429 Too Many Requests` response.

### HTTP Response

```
HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "status": 429,
  "title": "rate_limit_exceeded",
  "detail": "Request quota exceeded. Retry after the indicated interval."
}
```

### SDK Behavior

The SDK handles 429 responses automatically as part of its retry logic. By default, it retries up to 2 times with exponential backoff before surfacing a `RateLimitError`.

**Default behavior:** retry on 429, up to `max_retries` times (default: 2).

After all retries are exhausted, `RateLimitError` is raised (Python) or thrown (TypeScript).

### Catching Rate Limit Errors

**Python:**

```python
from highflame import Highflame, RateLimitError

client = Highflame(api_key="hf_sk_...")

try:
    resp = client.guard.evaluate_prompt(user_input)
except RateLimitError as e:
    # All retries exhausted
    print(f"Rate limited: {e.status} — {e.detail}")
    # Return a graceful degradation response
    return {"decision": "allow", "degraded": True}
```

**TypeScript:**

```typescript
import { Highflame, RateLimitError } from "@highflame/sdk";

const client = new Highflame({ apiKey: "hf_sk_..." });

try {
  const resp = await client.guard.evaluatePrompt(userInput);
} catch (err) {
  if (err instanceof RateLimitError) {
    console.warn(`Rate limited: ${err.status} — ${err.detail}`);
    // Degrade gracefully
    return { decision: "allow", degraded: true };
  }
  throw err;
}
```

### Adjusting Retry Behavior

Increase `max_retries` for workloads that can tolerate higher latency in exchange for fewer errors:

```python
# Python
client = Highflame(api_key="hf_sk_...", max_retries=5)
```

```typescript
// TypeScript
const client = new Highflame({ apiKey: "hf_sk_...", maxRetries: 5 });
```

Set `max_retries=0` to disable automatic retries and handle rate limiting entirely in your application code.

### Retry Policy Details

| Trigger | Retried? | Notes                                      |
| ------- | -------- | ------------------------------------------ |
| `429`   | Yes      | Exponential backoff with jitter            |
| `500`   | Yes      | Server error                               |
| `502`   | Yes      | Bad gateway                                |
| `503`   | Yes      | Service unavailable                        |
| `504`   | Yes      | Gateway timeout                            |
| `401`   | No       | Authentication error — retrying won't help |
| `400`   | No       | Client error — request is malformed        |

### High-Volume Workloads

For batch processing or high-throughput pipelines:

* Use **`monitor` mode** during initial rollout to observe traffic patterns without blocking
* **Parallelize requests** — the SDK is safe for concurrent use; one `Highflame` instance can handle many concurrent requests
* Set a **project-scoped client** so quota is tracked against the correct project identity

```python
# One shared client for the entire process
client = Highflame(
    api_key="hf_sk_...",
    account_id="acc_prod",
    project_id="proj_pipeline",
    max_retries=3,
)
```

### Quota Increases

To request higher quota limits, contact your Highflame account team through the [Highflame console](https://console.highflame.ai/) or your account manager. Include your account ID and a description of your expected request volume.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.highflame.ai/api-reference/rate-limits-and-quotas.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
