Why am I getting rate limited when I haven't reached my Service Quota limit?

AWS APIs utilize a token bucket algorithm that enforces both steady-state limits and burst limits. Even if your daily or hourly average is well below your quota, a sudden micro-burst of concurrent requests can instantly empty the bucket, triggering a ThrottlingException until tokens refill (usually per second).

Does the AWS SDK automatically handle rate limits and retries?

Yes, standard AWS SDKs (boto3, AWS CLI, etc.) automatically retry requests that fail with throttling or transient server errors. By default, they use 'legacy' or 'standard' modes which retry up to 3 times. For heavy workloads, you should explicitly enable 'adaptive' retry mode.

How can I distinguish between an AWS API timeout and a rate limit?

A rate limit returns an explicit HTTP 429 status code and a 'ThrottlingException' or 'Rate exceeded' message immediately. A timeout (HTTP 408, 504, or client-side socket timeout) means the request was sent but the client severed the connection before receiving a response, often due to network latency, NAT gateway saturation, or silent drops.

Can I pay AWS for higher API rate limits?

No, AWS does not charge for increasing API rate limits. Limit increases are requested via the Service Quotas console or AWS Support. Approval is based on your use case justification and current account health, not a premium fee.

Are all AWS API limits adjustable?

No. While most EC2, S3, and DynamoDB limits are adjustable via Service Quotas, several critical control-plane APIs—especially within AWS IAM (e.g., AssumeRole, GetRole) and AWS STS—are strict, hard limits that AWS will not increase under any circumstances to protect global infrastructure.

Fixing AWS API Rate Limit Exceeded and Timeout Errors (ThrottlingException)

Resolve AWS API rate limits (ThrottlingException) and timeouts with exponential backoff, jitter, and Service Quotas. Step-by-step SRE troubleshooting guide.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,193 words

Key Takeaways

Root cause 1: Burst API requests exceeding account-level or service-level API call limits, resulting in HTTP 429 ThrottlingException.
Root cause 2: Network latency, socket exhaustion, or backend AWS service degradation causing RequestTimeout (HTTP 408/504).
Quick fix: Change AWS SDK retry mode to 'adaptive', increase max_attempts, and implement exponential backoff with jitter for custom polling scripts.
Long-term fix: Migrate from API polling to EventBridge-driven architectures and request API limit increases via AWS Service Quotas.

Fix Approaches Compared
Method	When to Use	Time to Implement	Risk/Impact
SDK Retry Config (Adaptive)	Immediate fix for SDK/CLI timeouts and minor throttling.	5 mins	Low
Exponential Backoff + Jitter	Custom API scripts or bash automation hitting hard rate limits.	30 mins	Low
AWS Service Quota Increase	Sustained high-volume architectural needs (e.g., heavy EC2 scaling).	1-2 days	Low
Event-Driven (EventBridge)	Replacing heavy List/Describe API polling workloads.	1-2 weeks	Medium

Understanding the Error: ThrottlingException and RequestTimeout

When interacting with AWS services via the CLI, SDKs (boto3, AWS SDK for Go/Node.js), or direct API calls, DevOps engineers frequently encounter ThrottlingException (HTTP 429 Too Many Requests) or RequestTimeout (HTTP 408/504). AWS enforces API limits globally to ensure fair usage and protect their control plane from localized localized DDoS or runaway automation scripts. These limits are evaluated using a token bucket algorithm, varying heavily between read (e.g., DescribeInstances) and mutate (e.g., RunInstances) operations.

Identifying the Exact Error Messages

You will typically see standard output or application logs reflecting:

botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeInstances operation: Rate exceeded
RequestLimitExceeded: AWS API rate limit exceeded
TimeoutError: Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"

Step 1: Diagnose the Root Cause

Before indiscriminately applying retries, determine if this is a sustained volume issue or a micro-burst issue.

Analyze CloudTrail Logs: Query CloudTrail using Amazon Athena to identify which IAM role, user, or specific script is generating the most requests. Look for spikes in EventName associated with the throttled service.
Review CloudWatch Metrics: Check the Usage namespace for CallCount metrics. You can also monitor the ClientErrors metric on your NAT Gateways or API Gateways to correlate internal network timeouts with AWS API limits.
Determine Timeout vs. Throttle: If timeouts accompany rate limits, your application's connection pool might be exhausting its sockets while waiting for AWS to respond to repeatedly throttled requests.

Step 2: Configure AWS SDK Retry and Adaptive Modes

The standard AWS SDKs implement basic exponential backoff, but under heavy concurrency, this isn't enough. AWS recently introduced the adaptive retry mode. Unlike standard mode (which simply retries up to 3 times), adaptive mode dynamically adjusts the client-side request rate based on the throttled responses received from AWS, effectively smoothing out your request spikes locally before they hit the AWS control plane.

Set this in your environment: export AWS_RETRY_MODE=adaptive export AWS_MAX_ATTEMPTS=10

Step 3: Implement Exponential Backoff with Jitter

If you are writing custom HTTP clients or bash scripts that don't benefit from the AWS SDK's native retry handlers, you must implement exponential backoff with jitter. Standard exponential backoff causes the "Thundering Herd" problem, where multiple blocked threads retry at the exact same exponential intervals. Adding jitter (randomized delay) spreads out the retries, increasing the likelihood of successful token acquisition in the AWS API bucket.

Step 4: Request a Service Quota Increase

If your baseline architecture naturally exceeds AWS defaults (e.g., a heavily multi-tenant SaaS aggressively polling SQS or a Kubernetes cluster constantly describing EC2 ASGs via cluster-autoscaler), you must request a quota increase. Navigate to the AWS Service Quotas console, locate the specific API operation, and submit a request. Note: IAM API limits (like AssumeRole) are strictly hard-capped globally and generally cannot be increased. You must optimize your token caching instead.

Step 5: Architectural Fixes (Event-Driven Design)

Stop polling. The ultimate SRE fix for rate limits is architectural. If your application constantly calls Describe* or List* APIs to check for state changes, introduce a caching layer like Amazon ElastiCache (Redis) or transition to an event-driven architecture. Use Amazon EventBridge to listen for state changes (e.g., EC2 instance state changes, ECS task terminations) and push those updates to a local database. Query your local database instead of hitting the AWS API control plane.

Frequently Asked Questions

bash

# Set AWS CLI retry configuration globally to adaptive mode
aws configure set default.retry_mode adaptive
aws configure set default.max_attempts 10

# Diagnose rate limits using CloudTrail and jq (requires jq installed)
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=DescribeInstances \
  --start-time $(date -v-1H +%s) \
  --max-results 50 | jq '.Events[] | {Time: .EventTime, User: .Username, Error: .CloudTrailEvent | fromjson | .errorCode}' | grep Throttling

# SRE Bash wrapper: Exponential backoff with jitter for raw API calls
MAX_RETRIES=5
RETRY_COUNT=0
BASE_DELAY=1

while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
  # Execute your AWS CLI command here
  OUTPUT=$(aws ec2 describe-instances 2>&1)
  EXIT_CODE=$?
  
  if [[ $EXIT_CODE -eq 0 ]]; then
    echo "API call succeeded!"
    break
  elif [[ "$OUTPUT" == *"ThrottlingException"* ]] || [[ "$OUTPUT" == *"Timeout"* ]]; then
    # Calculate random jitter between 0 and current BASE_DELAY
    JITTER=$(awk -v min=0 -v max=$BASE_DELAY 'BEGIN{srand(); print min+rand()*(max-min)}')
    
    echo "Rate limited or timed out. Retrying in $JITTER seconds..."
    sleep $JITTER
    
    # Exponential increase for the next loop
    BASE_DELAY=$((BASE_DELAY * 2))
    RETRY_COUNT=$((RETRY_COUNT + 1))
  else
    echo "Critical Error encountered: $OUTPUT"
    exit 1
  fi
done

if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then
  echo "Failed after $MAX_RETRIES attempts."
  exit 1
fi

Error Medic Editorial

The Error Medic Editorial team consists of senior DevOps engineers, AWS Certified Solutions Architects, and Site Reliability Engineers dedicated to untangling complex cloud infrastructure bottlenecks.

Sources

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Fixing 'ThrottlingException: Rate exceeded' and AWS API Timeouts

Resolve AWS API rate limits (ThrottlingException) and timeouts by implementing exponential backoff, jitter, and requesting service quota increases.

Fixing AWS API Rate Limit (ThrottlingException) and Timeout Errors

Resolve AWS API 'Rate exceeded' (429) and timeout (504) errors. A complete SRE guide to exponential backoff, jitter, SDK adaptive retries, and quota increases.

How to Fix AWS API Rate Limit (ThrottlingException: Rate exceeded) and Timeout Errors

Resolve AWS API rate limit (ThrottlingException) and timeout errors by implementing exponential backoff, jitter, requesting quota increases, and optimizing API