Fixing 'ThrottlingException: Rate exceeded' and AWS API Timeouts
Resolve AWS API rate limits (ThrottlingException) and timeouts by implementing exponential backoff, jitter, and requesting service quota increases.
- Root Cause 1: Exceeding the allowed API request rate (Requests Per Second) for a specific AWS service, triggering a ThrottlingException.
- Root Cause 2: Network latency or backend service degradation causing the AWS SDK to hit its configured read/connect timeout threshold.
- Quick Fix Summary: Implement exponential backoff with jitter in your retry logic, and request a Service Quota increase if the baseline traffic genuinely exceeds defaults.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Exponential Backoff & Jitter | Client-side mitigation for bursty traffic causing intermittent ThrottlingExceptions. | Medium | Low |
| Service Quota Increase | Sustained high traffic that consistently hits the default API limits. | Slow (AWS Approval) | None |
| Caching API Responses | Read-heavy workloads polling the same AWS resources (e.g., DescribeInstances). | Medium | Medium (Stale Data) |
| Tuning SDK Timeouts | Addressing 'Connect timeout on endpoint' or 'Read timeout' errors. | Fast | Low |
Understanding the Error
When interacting with Amazon Web Services (AWS) via the CLI, SDKs, or direct API calls, you may encounter rate-limiting and timeout errors. These mechanisms are designed to protect AWS infrastructure from abuse and ensure fair usage among all tenants. However, for high-throughput applications, they often surface as sudden, disruptive failures.
The most common error messages you will encounter are:
ThrottlingException: Rate exceededProvisionedThroughputExceededException(DynamoDB specific)botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the [API] operation: Rate exceededRead timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"Connect timeout on endpoint URL
AWS APIs operate on a token bucket algorithm. You are granted a specific number of tokens (API calls) per second, with a burst capacity. Once the bucket is empty, subsequent requests are dropped, and AWS returns an HTTP 400 (Bad Request) or HTTP 503 (Service Unavailable) status with a throttling error. Timeouts, on the other hand, usually occur when the SDK waits longer for a response or connection than its configured threshold, which can be caused by network drops or transient AWS backend latency.
Step 1: Diagnose the Bottleneck
Before applying a fix, you must determine whether you are dealing with a hard rate limit or a network timeout. You can use AWS CloudTrail and Amazon CloudWatch to analyze the API calls.
- Analyze CloudWatch Metrics: Navigate to CloudWatch and check the
Usagemetrics for the specific service. Look forCallCountand compare it against your known quotas. - Inspect SDK Logs: Enable debug logging in your AWS SDK (e.g.,
boto3.set_stream_logger('')in Python). Look for the HTTP status codes. A429 Too Many Requestsor400 Bad RequestwithThrottlingExceptionconfirms a rate limit. A timeout will usually throw a lower-level socket or urllib3 exception. - Check Service Quotas: Go to the AWS Service Quotas console to see your current limits for the affected API operation. For example, the
DescribeInstancesAPI in EC2 has a strict default limit.
Step 2: Implement Client-Side Resiliency (Exponential Backoff)
The standard, AWS-recommended approach to handling ThrottlingException is implementing exponential backoff with jitter. Most modern AWS SDKs (like Boto3 for Python or the AWS SDK for Node.js) have built-in retry mechanisms, but they may need tuning for high-concurrency environments.
When a request fails due to rate limiting, the client should wait a short amount of time before retrying. If the retry fails, the wait time is doubled (exponential backoff). To prevent the "thundering herd" problem where multiple clients retry at the exact same millisecond, you add "jitter" (randomized delay) to the wait time.
If you are writing custom HTTP clients or need aggressive retry policies, you must implement this logic manually. See the Code Block section for a robust Python example.
Step 3: Request a Service Quota Increase
If your application legitimately requires a higher API request rate than the default limits, you must request a quota increase. This is not an immediate fix; it requires AWS support approval.
- Open the Service Quotas console in AWS.
- Select the AWS service (e.g., Amazon EC2).
- Search for the specific API or quota (e.g.,
DescribeInstances,RunInstances). - Select the quota and click Request quota increase.
- Enter the desired value and provide a clear, detailed business justification. AWS Support will review the request and may ask for architectural details to ensure you aren't masking a poorly optimized application.
Step 4: Optimize API Usage (Caching and Batching)
Often, rate limits are hit because of inefficient API usage rather than pure scale.
- Caching: If multiple microservices frequently call
sts:AssumeRoleorec2:DescribeInstances, implement a local cache (like Redis or an in-memory TTL cache) to store the results. AWS responses rarely change second-by-second. - Batching: Instead of iterating through a list of 100 instance IDs and calling
DescribeInstances100 times, pass all 100 IDs in a singleDescribeInstancesAPI call. Most AWS 'Describe' APIs support bulk queries.
Step 5: Tuning SDK Timeouts
If you are experiencing timeouts (Read timeout or Connect timeout), the issue is often related to the SDK's default configuration or the network path (e.g., NAT Gateway exhaustion).
AWS SDKs have two primary timeout settings:
- Connect Timeout: The time the SDK will wait to establish a TCP connection to the AWS endpoint.
- Read Timeout: The time the SDK will wait for a response from the server after the connection is established.
You can override these defaults. For instance, in Boto3, you use the botocore.config.Config object. In heavily loaded Lambda functions or containers running in congested subnets, slightly increasing the connect timeout (e.g., from 1 second to 5 seconds) can resolve intermittent failures.
Frequently Asked Questions
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
import time
import random
# 1. Configuring SDK Timeouts and Built-in Retries
# Using the 'adaptive' retry mode which dynamically limits the rate of requests
custom_config = Config(
retries = {
'max_attempts': 10,
'mode': 'adaptive'
},
connect_timeout=5, # Increase connect timeout to 5 seconds
read_timeout=60 # Increase read timeout to 60 seconds
)
ec2_client = boto3.client('ec2', config=custom_config)
# 2. Manual Exponential Backoff with Jitter (If not relying on SDK)
def call_aws_with_backoff(api_func, *args, **kwargs):
max_retries = 5
base_delay = 1 # seconds
for attempt in range(max_retries):
try:
return api_func(*args, **kwargs)
except ClientError as e:
if e.response['Error']['Code'] == 'ThrottlingException':
if attempt == max_retries - 1:
raise # Re-raise if max retries reached
# Calculate delay: (2 ^ attempt) * base_delay + random jitter
delay = (2 ** attempt) * base_delay
jitter = random.uniform(0, 0.5 * delay)
sleep_time = delay + jitter
print(f"Throttled. Retrying in {sleep_time:.2f} seconds...")
time.sleep(sleep_time)
else:
raise # Re-raise non-throttling errors
# Example usage of manual backoff
# response = call_aws_with_backoff(ec2_client.describe_instances)Error Medic Editorial
The Error Medic Editorial team consists of senior Site Reliability Engineers and Cloud Architects dedicated to documenting obscure infrastructure edge cases and providing actionable, production-ready solutions.