Why am I getting ThrottlingException even though my overall AWS usage is low?

API limits are often measured in Requests Per Second (RPS) per account per region. Even if your daily, monthly, or overall cloud usage is low, a sudden burst of requests in a single second (like a poorly written loop) can trigger the rate limit.

Does AWS charge for API requests that result in a 429 Too Many Requests error?

Generally, no. AWS does not typically charge for requests that fail with a 403 (Access Denied) or 429 (Throttling) error, as the request is rejected before the service processes it. However, it's best to verify with the specific service pricing page.

How do I configure retries in Boto3 for Python?

You can configure retries by passing a `Config` object when creating the client: `client = boto3.client('ec2', config=Config(retries={'max_attempts': 10, 'mode': 'adaptive'}))`.

What is the difference between standard and adaptive retry modes in AWS SDKs?

Standard mode uses traditional exponential backoff. Adaptive mode goes a step further by dynamically adjusting the rate of requests based on the throttling responses received from AWS, effectively throttling the client side before the server does, ensuring higher success rates.

Can I increase all AWS API limits through Service Quotas?

No. While many limits are 'soft' and can be increased via the Service Quotas console, some APIs have 'hard' limits designed to protect the AWS control plane. AWS Support will not raise hard limits, requiring an architectural redesign.

How to Fix AWS API Rate Limit Exceeded and Timeout Errors (ThrottlingException)

Resolve AWS API rate limits (ThrottlingException) and timeouts using exponential backoff, jitter, and AWS Service Quotas to ensure reliable cloud deployments.

Last updated: February 24, 2026

Last verified: February 24, 2026

2,080 words

Key Takeaways

AWS API limits protect infrastructure, resulting in ThrottlingException or HTTP 429 Too Many Requests when limits are exceeded.
API Timeouts (HTTP 504) frequently co-occur when retry mechanisms exhaust connection pools or wait times.
Immediate mitigation requires implementing exponential backoff with jitter in your AWS SDK retry strategy.
Long-term stability requires architecture shifts: caching reads, using batch APIs, and migrating from polling to event-driven architectures.

AWS API Rate Limit Fix Approaches Compared
Method	When to Use	Time to Implement	Risk Level
Exponential Backoff & Jitter	Immediate fix for intermittent ThrottlingExceptions across all services.	10-30 mins	Low
AWS Service Quota Increase	Consistent throttling despite robust backoff and optimized client-side logic.	1-2 days (AWS approval)	Low
API Call Batching/Caching	High volume reads (SSM) or writes (SQS) that can be easily consolidated.	Days to Weeks	Medium
Event-Driven Architecture	Systemic limitations reached, synchronous polling issues causing massive overhead.	Weeks to Months	High

Understanding AWS API Rate Limits and Timeouts

When building robust, cloud-native applications or managing infrastructure as code on Amazon Web Services (AWS), you will inevitably encounter situations where your requests are suddenly rejected. Among the most frustrating and common of these interruptions are AWS API Rate Limits and AWS API Timeouts.

These errors represent a critical intersection between system design, network reliability, and cloud resource management. Understanding the exact nature of these errors, how they interrelate, and how to programmatically and architecturally resolve them is a fundamental skill for any Site Reliability Engineer (SRE) or DevOps professional.

The Mechanics of AWS API Throttling

AWS operates a massive, multi-tenant global infrastructure. To protect the underlying control planes and data planes from being overwhelmed by runaway processes, malicious attacks, or simply poorly optimized code, AWS employs sophisticated API throttling mechanisms.

Throttling is the process of limiting the number of API requests a user or account can make within a specific timeframe (usually measured in Requests Per Second, or RPS). When your application exceeds this permitted throughput for a particular service endpoint in a specific region, AWS acts defensively. It rejects subsequent incoming requests and returns a ThrottlingException or an HTTP 429 Too Many Requests status code.

It is crucial to understand that API limits are generally decoupled from your account's billing or overall resource limits (like the maximum number of EC2 instances you can run). You might have the budget and quota to run 10,000 EC2 instances, but if you try to query their status all at once using DescribeInstances with 5,000 RPS, you will be heavily throttled.

Why Do Timeouts Occur Simultaneously?

You might often see API timeouts (HTTP 504 Gateway Timeout, ReadTimeoutError, or ConnectTimeoutError) occurring in tandem with rate limits. This co-occurrence is not a coincidence and typically stems from the following scenarios:

Exhausted Connection Pools: When your application makes an API call that gets throttled, the AWS SDK's built-in retry mechanism kicks in. If you have a high volume of requests, the retries can stack up. The SDK maintains a connection pool, and if all connections are busy waiting for backoff timers to expire or waiting for sluggish responses from a congested endpoint, new requests will fail to acquire a connection within the timeout window, leading to a timeout error.
Network Queueing and Dropped Packets: In extreme cases of throttling at the AWS perimeter, network load balancers or API gateways might become so saturated with rejecting requests that they drop connections or fail to respond within the client's configured read timeout threshold.
Application Thread Starvation: If your application uses synchronous, blocking calls to the AWS API, a sudden wave of throttling means threads are suspended waiting for retries. This can lead to thread starvation, where the application becomes unresponsive and internal health checks or upstream load balancers report timeouts.

Common Error Messages in the Wild

Identifying the exact error signature is the first step in troubleshooting. You will typically find these errors in your application logs, AWS Lambda execution logs, or CI/CD pipeline outputs.

Boto3 (Python): botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeInstances operation: Rate exceeded
AWS SDK for Go: awserr.Error: Throttling: Rate exceeded
AWS CLI: An error occurred (Throttling) when calling the GetParameter operation (reached max retries: 4): Rate exceeded
Raw HTTP Response: HTTP/1.1 429 Too Many Requests
Timeout (Python): botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"

Step 1: Diagnose the Source of the Throttling

Blindly increasing retries is a band-aid. You must pinpoint the exact API, the IAM identity making the calls, and the pattern of the usage spikes.

1. Analyze Amazon CloudWatch Metrics

AWS provides built-in metrics to track API usage.

Navigate to the CloudWatch console.
Select Metrics -> All metrics.
Look for the AWS/Usage namespace.
Search for the CallCount metric. You can filter by Service and Resource to identify which specific APIs are spiking. If you see sharp, vertical lines in the graph, you have identified a bursty traffic pattern that is likely triggering the limits.

2. Deep Dive with AWS CloudTrail

CloudWatch tells you that throttling is happening; CloudTrail tells you who and what is causing it.

Go to the CloudTrail console and view Event history.
Filter by Event name or Error code. Look specifically for ThrottlingException or Throttling.
For a more comprehensive analysis, especially across multiple regions or accounts, querying CloudTrail logs stored in Amazon S3 using Amazon Athena is highly recommended.

3. Identify Problematic Application Patterns

Once you know the API and the identity, review the code or system triggering the calls. Common culprits include:

Unoptimized CI/CD Pipelines: Scripts that poll DescribeStackEvents aggressively to check CloudFormation deployment status without backoff.
Overzealous Auto Scaling: Rapid scaling events triggering thousands of parameter lookups in SSM Parameter Store simultaneously.
Poorly Written Cron Jobs: Scheduled tasks that wake up and attempt to process thousands of records across AWS services concurrently instead of batching them.

Step 2: Implement Robust Solutions

Fixing AWS API rate limits requires a multi-layered approach, ranging from immediate client-side mitigation to long-term architectural redesigns.

Fix Approach A: Implement Exponential Backoff with Jitter (Immediate Mitigation)

The most critical and immediate fix is to ensure your client application handles 429 and 5xx responses gracefully. Simply retrying immediately (a tight loop) will only worsen the situation and keep you throttled longer.

Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process. In the context of API calls, if the first request fails, you wait 1 second before retrying. If that fails, you wait 2 seconds, then 4 seconds, then 8 seconds, up to a maximum delay.

However, standard exponential backoff has a flaw: the Thundering Herd Problem. If 100 Lambda functions all fail at the exact same millisecond and use the exact same backoff formula, they will all retry together at 1 second, then together at 2 seconds, effectively creating synchronized spikes that guarantee further throttling.

The solution is Jitter—adding a randomized delay to the backoff equation. This spreads the retries out over time, smoothing the load on the AWS endpoint. Most modern AWS SDKs implement this automatically, but the default configuration is often too conservative for high-throughput applications.

Fix Approach B: Request an AWS Service Quota Increase (Medium-Term)

If your application's architecture is sound, your retry logic is flawless, and you still consistently hit rate limits because your business fundamentally requires high API throughput, you need to request a quota increase.

Navigate to the Service Quotas console in AWS.
Select the relevant AWS service (e.g., AWS Systems Manager).
Search for the specific API limit. Note that API rate limits are often labeled as "API requests per second" or "Throughput limits."
Select the limit and click Request quota increase.
Provide the new desired value. Crucially, provide a detailed justification. AWS Support reviews these requests to ensure you aren't masking bad architecture. Explain your use case, the impact of the throttling, and confirm you have implemented exponential backoff.

Note: Some limits are "hard limits" and cannot be increased to protect the overall stability of the AWS region. If you hit a hard limit, you MUST redesign your architecture.

Fix Approach C: Architectural Redesign & Optimization (Long-Term)

The most sustainable way to avoid API rate limits is to stop making so many API calls.

1. Implement Aggressive Caching Strategy If your application frequently reads configuration data, secrets, or state that rarely changes, cache it locally.

SSM Parameter Store / Secrets Manager: Use the AWS Parameter and Secrets Lambda Extension, or implement a local caching layer (e.g., Redis, Memcached, or simple in-memory cache) with a reasonable Time-To-Live (TTL). If a configuration changes once a week, querying it 100 times a second is an anti-pattern.

2. Shift to Event-Driven Patterns Polling is the enemy of API limits. If you have a process running a while True: loop calling DescribeInstances to wait for a server to boot, you will quickly get throttled.

Use Amazon EventBridge: Almost all AWS state changes emit events to EventBridge. Instead of polling, configure an EventBridge rule to trigger a Lambda function, SQS queue, or SNS topic exactly when the state changes. This reduces your API calls from hundreds to zero.

3. Utilize Batch APIs Where possible, bundle multiple operations into a single API call.

Instead of calling sqs:SendMessage 100 times in a loop, use sqs:SendMessageBatch which can process up to 10 messages per call, cutting your API request volume by 90%.

Conclusion

AWS API Rate Limits and Timeouts are protective mechanisms, not punishments. By correctly interpreting ThrottlingException errors, utilizing CloudWatch and CloudTrail for diagnostics, implementing robust retry logic with jitter, and ultimately shifting towards event-driven, cached architectures, DevOps and SRE teams can ensure their cloud infrastructure operates smoothly and scales reliably without interruption.

Frequently Asked Questions

python

import boto3
from botocore.config import Config

# Configure custom retry strategy with adaptive mode
# Adaptive mode client-side throttles to avoid hitting server limits
boto_config = Config(
    retries = {
        'max_attempts': 10,  # Increase maximum retry attempts
        'mode': 'adaptive'   # Enable smart client-side throttling
    },
    connect_timeout=5,       # Fail fast on network connection issues
    read_timeout=15          # Allow sufficient time for API response
)

# Initialize the AWS client with the custom configuration
try:
    # Example: Initializing SSM client for Parameter Store lookups
    ssm_client = boto3.client('ssm', region_name='us-east-1', config=boto_config)
    
    # Example API call that is frequently throttled if polled rapidly
    response = ssm_client.get_parameter(Name='/config/myapp/db_url')
    print("Successfully retrieved parameter.")
    
except Exception as e:
    print(f"API call failed after max retries exhausted: {e}")

# ---------------------------------------------------------
# Bash diagnostic using AWS CLI and Amazon Athena (via CloudTrail)
# ---------------------------------------------------------
# Use this Athena query to find the top throttled APIs in your account
"""
SELECT 
    eventsource, 
    eventname, 
    count(*) as throttle_count
FROM cloudtrail_logs
WHERE errorcode = 'ThrottlingException'
    AND eventtime > '2023-10-01T00:00:00Z'
GROUP BY eventsource, eventname
ORDER BY throttle_count DESC
LIMIT 10;
"""

Error Medic Editorial

Our team of seasoned Site Reliability Engineers and Cloud Architects is dedicated to solving the most complex infrastructure issues, providing actionable guides to keep your systems highly available and resilient.

Sources

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Fixing 'ThrottlingException: Rate exceeded' and AWS API Timeouts

Resolve AWS API rate limits (ThrottlingException) and timeouts by implementing exponential backoff, jitter, and requesting service quota increases.

Fixing AWS API Rate Limit (ThrottlingException) and Timeout Errors

Resolve AWS API 'Rate exceeded' (429) and timeout (504) errors. A complete SRE guide to exponential backoff, jitter, SDK adaptive retries, and quota increases.

Fixing AWS API Rate Limit Exceeded and Timeout Errors (ThrottlingException)

Resolve AWS API rate limits (ThrottlingException) and timeouts with exponential backoff, jitter, and Service Quotas. Step-by-step SRE troubleshooting guide.