Rate Limits & Quotas

Highflame enforces per-account request quotas to ensure service stability. When a quota is exceeded, the API returns a 429 Too Many Requests response.

HTTP Response

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "status": 429,
  "title": "rate_limit_exceeded",
  "detail": "Request quota exceeded. Retry after the indicated interval."
}

SDK Behavior

The SDK handles 429 responses automatically as part of its retry logic. By default, it retries up to 2 times with exponential backoff before surfacing a RateLimitError.

Default behavior: retry on 429, up to max_retries times (default: 2).

After all retries are exhausted, RateLimitError is raised (Python) or thrown (TypeScript).

Catching Rate Limit Errors

Python:

from highflame import Highflame, RateLimitError

client = Highflame(api_key="hf_sk_...")

try:
    resp = client.guard.evaluate_prompt(user_input)
except RateLimitError as e:
    # All retries exhausted
    print(f"Rate limited: {e.status}{e.detail}")
    # Return a graceful degradation response
    return {"decision": "allow", "degraded": True}

TypeScript:

Adjusting Retry Behavior

Increase max_retries for workloads that can tolerate higher latency in exchange for fewer errors:

Set max_retries=0 to disable automatic retries and handle rate limiting entirely in your application code.

Retry Policy Details

Trigger
Retried?
Notes

429

Yes

Exponential backoff with jitter

500

Yes

Server error

502

Yes

Bad gateway

503

Yes

Service unavailable

504

Yes

Gateway timeout

401

No

Authentication error — retrying won't help

400

No

Client error — request is malformed

High-Volume Workloads

For batch processing or high-throughput pipelines:

  • Use monitor mode during initial rollout to observe traffic patterns without blocking

  • Parallelize requests — the SDK is safe for concurrent use; one Highflame instance can handle many concurrent requests

  • Set a project-scoped client so quota is tracked against the correct project identity

Quota Increases

To request higher quota limits, contact your Highflame account team through the Highflame console or your account manager. Include your account ID and a description of your expected request volume.

Last updated