Why am I getting a 429 error when I haven't made many requests?

You may be hitting the Tokens Per Minute (TPM) limit rather than the requests limit. A single request with a massive prompt and high max_tokens can exhaust your TPM instantly, especially on the Free or Tier 1 usage tiers.

How do I fix 'You exceeded your current quota, please check your plan and billing details'?

This is a 403 error indicating your pre-paid credit balance is zero or negative. Log into the OpenAI platform, navigate to Settings > Billing, and add funds to your account. It may take a few minutes for the API to recognize the new balance.

What is the best way to handle OpenAI API timeouts?

First, increase the timeout configuration in your HTTP client (e.g., to 60 or 120 seconds). Second, switch to streaming responses (`stream=True`) so your application receives the first bytes quickly, preventing load balancer timeouts. Finally, implement exponential backoff for retries.

Why does my code work locally but throw a 401 Unauthorized in production?

This almost always means the `OPENAI_API_KEY` environment variable is not set correctly in your production environment (e.g., Docker, Vercel, AWS). Double-check your secrets manager and ensure the variable is exposed to the runtime.

Is it safe to retry 500 or 503 errors immediately?

No. Retrying immediately during a server outage can exacerbate the issue and may result in your IP being temporarily blocked. Always use exponential backoff with a random jitter when retrying 5xx errors.

Troubleshooting OpenAI API Errors: 429 Rate Limits, 401s, and 5xx Timeouts

Comprehensive guide to resolving OpenAI API 429 Rate Limit Exceeded, 401 Unauthorized, and 5xx server errors with actionable retry logic and diagnostic scripts.

Last updated: February 24, 2026

Last verified: February 24, 2026

979 words

Key Takeaways

HTTP 429 (Rate Limit Exceeded) is the most common error, triggered by hitting token per minute (TPM) or request per minute (RPM) limits.
Implement exponential backoff with jitter to gracefully handle 429 and transient 5xx server errors without overloading the API.
HTTP 401/403 errors usually indicate an invalid API key, missing organizational ID, or depleted pre-paid billing quota.

Fix Approaches Compared
Method	When to Use	Time	Risk
Exponential Backoff	429 Rate Limits & 5xx Server Errors	Minutes	Low
API Key Rotation	401 Unauthorized / Compromised Keys	Immediate	Medium
Quota Increase	403 Insufficient Quota / Sustained Growth	Hours/Days	Low
Client Timeout Increase	Timeout / 502 Bad Gateway	Immediate	Low

Understanding the Error

When building applications on top of the OpenAI API, encountering HTTP errors is inevitable. Due to the high computational cost of generative AI models, OpenAI enforces strict rate limits and occasionally experiences system-wide latency. Understanding the exact error codes—specifically 429, 401, 403, and the 5xx family—is critical for building resilient AI applications.

The 429 Rate Limit Exceeded Error

The most frequent stumbling block for developers is the 429 Too Many Requests error. The standard error payload looks like this:

{
  "error": {
    "message": "Rate limit reached for default-gpt-3.5-turbo in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s.",
    "type": "requests",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

This occurs because OpenAI accounts are categorized into Usage Tiers (Free, Tier 1 through Tier 5). Each tier has specific limits for Requests Per Minute (RPM), Tokens Per Minute (TPM), and Tokens Per Day (TPD). If you burst traffic or process large batches of text, you will quickly hit the TPM limit.

Step 1: Diagnose

Check Response Headers: OpenAI includes x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and x-ratelimit-reset-requests headers. Inspect these to see exactly which limit you hit (tokens or requests).
Review Usage Tier: Visit your OpenAI dashboard (platform.openai.com/account/limits) to verify your current tier. Moving from Free to Tier 1 requires setting up a paid account and purchasing at least $5 of credit.
Identify Auth Errors (401/403): If you receive a 401 Unauthorized or 403 Forbidden error, the API is rejecting your credentials. This is often an "Incorrect API key provided" or "You exceeded your current quota" message.
Monitor Server Errors (5xx): A 500 Internal Server Error or 502/503 indicates a problem on OpenAI's infrastructure. Check status.openai.com.

Step 2: Fix

1. Implement Exponential Backoff (429 & 5xx) Never retry immediately. Use an exponential backoff strategy with jitter. This means waiting a short time (e.g., 1 second), then doubling the wait time for subsequent failures (2s, 4s, 8s), plus a random millisecond delay (jitter) to prevent the thundering herd problem.

2. Resolve Authentication and Quotas (401 & 403)

Verify your .env file is loaded correctly (e.g., using python-dotenv).
Log in to the billing dashboard and ensure your credit balance is greater than $0.
Regenerate the API key if you suspect it has been compromised or deleted.

3. Mitigate Timeouts If you are using the official Python or Node.js SDKs, the default timeout might be too aggressive for large generations. Override the default timeout parameter. For long completions, enable stream=True. This keeps the connection alive by sending chunks of data as they are generated, preventing idle network timeouts from load balancers or proxies.

Frequently Asked Questions

python

import os
import time
import logging
from openai import OpenAI, RateLimitError, APIError, APITimeoutError
from tenacity import retry, stop_after_attempt, wait_random_exponential, retry_if_exception_type

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    timeout=60.0, # Increase timeout for large models
    max_retries=0 # Disable default retries to use Tenacity
)

# Configure resilient retry logic using Tenacity
@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(5),
    retry=retry_if_exception_type((RateLimitError, APIError, APITimeoutError)),
    before_sleep=lambda retry_state: logger.warning(f"Retrying due to error: {retry_state.outcome.exception()}")
)
def generate_text_with_backoff(prompt: str, model="gpt-3.5-turbo") -> str:
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
    except RateLimitError as e:
        logger.error(f"Rate limit exceeded (429): {e}")
        raise
    except APITimeoutError as e:
        logger.error(f"Request timed out: {e}")
        raise
    except APIError as e:
        # Handles 500, 502, 503 errors
        logger.error(f"OpenAI Server Error: {e}")
        raise

# Example usage:
if __name__ == "__main__":
    try:
        result = generate_text_with_backoff("Explain exponential backoff in one sentence.")
        print(result)
    except Exception as e:
        print(f"Final failure after retries: {e}")

Error Medic Editorial

Error Medic Editorial is a collective of senior Site Reliability Engineers and DevOps practitioners dedicated to providing actionable, code-first solutions for modern infrastructure and API integration challenges.

Sources

Resolve OpenAI API 429 Rate Limit, 401, 500, and timeout errors. Learn how to implement exponential backoff, track token usage, and diagnose response headers.

Fixing OpenAI API Rate Limits (429) and Connection Errors (500, 502, 503, Timeout)

Comprehensive guide to resolving OpenAI API rate limits (Error 429), timeouts, and server errors. Learn how to implement exponential backoff and handle 5xx erro

Troubleshooting OpenAI API Error 429: 'Rate limit reached', 401s, and 5xx Timeouts

Resolve OpenAI API 429 rate limits, 401/403 auth errors, and 500/503 timeouts. Learn to implement exponential backoff, check quotas, and handle connection drops

Troubleshooting OpenAI API Errors: Fixing Rate Limits (429), Timeouts, and Authentication (401/403)

Comprehensive guide to diagnosing and fixing OpenAI API rate limits (HTTP 429), timeouts, and 5xx server errors. Learn how to implement exponential backoff.