Error Medic

Fixing AWS API Rate Limit Exceeded and Timeout Errors (ThrottlingException)

Resolve AWS API rate limits (ThrottlingException) and timeouts with exponential backoff, jitter, and Service Quotas. Step-by-step SRE troubleshooting guide.

Last updated:
Last verified:
1,193 words
Key Takeaways
  • Root cause 1: Burst API requests exceeding account-level or service-level API call limits, resulting in HTTP 429 ThrottlingException.
  • Root cause 2: Network latency, socket exhaustion, or backend AWS service degradation causing RequestTimeout (HTTP 408/504).
  • Quick fix: Change AWS SDK retry mode to 'adaptive', increase max_attempts, and implement exponential backoff with jitter for custom polling scripts.
  • Long-term fix: Migrate from API polling to EventBridge-driven architectures and request API limit increases via AWS Service Quotas.
Fix Approaches Compared
MethodWhen to UseTime to ImplementRisk/Impact
SDK Retry Config (Adaptive)Immediate fix for SDK/CLI timeouts and minor throttling.5 minsLow
Exponential Backoff + JitterCustom API scripts or bash automation hitting hard rate limits.30 minsLow
AWS Service Quota IncreaseSustained high-volume architectural needs (e.g., heavy EC2 scaling).1-2 daysLow
Event-Driven (EventBridge)Replacing heavy List/Describe API polling workloads.1-2 weeksMedium

Understanding the Error: ThrottlingException and RequestTimeout

When interacting with AWS services via the CLI, SDKs (boto3, AWS SDK for Go/Node.js), or direct API calls, DevOps engineers frequently encounter ThrottlingException (HTTP 429 Too Many Requests) or RequestTimeout (HTTP 408/504). AWS enforces API limits globally to ensure fair usage and protect their control plane from localized localized DDoS or runaway automation scripts. These limits are evaluated using a token bucket algorithm, varying heavily between read (e.g., DescribeInstances) and mutate (e.g., RunInstances) operations.

Identifying the Exact Error Messages

You will typically see standard output or application logs reflecting:

  • botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeInstances operation: Rate exceeded
  • RequestLimitExceeded: AWS API rate limit exceeded
  • TimeoutError: Read timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"

Step 1: Diagnose the Root Cause

Before indiscriminately applying retries, determine if this is a sustained volume issue or a micro-burst issue.

  1. Analyze CloudTrail Logs: Query CloudTrail using Amazon Athena to identify which IAM role, user, or specific script is generating the most requests. Look for spikes in EventName associated with the throttled service.
  2. Review CloudWatch Metrics: Check the Usage namespace for CallCount metrics. You can also monitor the ClientErrors metric on your NAT Gateways or API Gateways to correlate internal network timeouts with AWS API limits.
  3. Determine Timeout vs. Throttle: If timeouts accompany rate limits, your application's connection pool might be exhausting its sockets while waiting for AWS to respond to repeatedly throttled requests.

Step 2: Configure AWS SDK Retry and Adaptive Modes

The standard AWS SDKs implement basic exponential backoff, but under heavy concurrency, this isn't enough. AWS recently introduced the adaptive retry mode. Unlike standard mode (which simply retries up to 3 times), adaptive mode dynamically adjusts the client-side request rate based on the throttled responses received from AWS, effectively smoothing out your request spikes locally before they hit the AWS control plane.

Set this in your environment: export AWS_RETRY_MODE=adaptive export AWS_MAX_ATTEMPTS=10

Step 3: Implement Exponential Backoff with Jitter

If you are writing custom HTTP clients or bash scripts that don't benefit from the AWS SDK's native retry handlers, you must implement exponential backoff with jitter. Standard exponential backoff causes the "Thundering Herd" problem, where multiple blocked threads retry at the exact same exponential intervals. Adding jitter (randomized delay) spreads out the retries, increasing the likelihood of successful token acquisition in the AWS API bucket.

Step 4: Request a Service Quota Increase

If your baseline architecture naturally exceeds AWS defaults (e.g., a heavily multi-tenant SaaS aggressively polling SQS or a Kubernetes cluster constantly describing EC2 ASGs via cluster-autoscaler), you must request a quota increase. Navigate to the AWS Service Quotas console, locate the specific API operation, and submit a request. Note: IAM API limits (like AssumeRole) are strictly hard-capped globally and generally cannot be increased. You must optimize your token caching instead.

Step 5: Architectural Fixes (Event-Driven Design)

Stop polling. The ultimate SRE fix for rate limits is architectural. If your application constantly calls Describe* or List* APIs to check for state changes, introduce a caching layer like Amazon ElastiCache (Redis) or transition to an event-driven architecture. Use Amazon EventBridge to listen for state changes (e.g., EC2 instance state changes, ECS task terminations) and push those updates to a local database. Query your local database instead of hitting the AWS API control plane.

Frequently Asked Questions

bash
# Set AWS CLI retry configuration globally to adaptive mode
aws configure set default.retry_mode adaptive
aws configure set default.max_attempts 10

# Diagnose rate limits using CloudTrail and jq (requires jq installed)
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=DescribeInstances \
  --start-time $(date -v-1H +%s) \
  --max-results 50 | jq '.Events[] | {Time: .EventTime, User: .Username, Error: .CloudTrailEvent | fromjson | .errorCode}' | grep Throttling

# SRE Bash wrapper: Exponential backoff with jitter for raw API calls
MAX_RETRIES=5
RETRY_COUNT=0
BASE_DELAY=1

while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
  # Execute your AWS CLI command here
  OUTPUT=$(aws ec2 describe-instances 2>&1)
  EXIT_CODE=$?
  
  if [[ $EXIT_CODE -eq 0 ]]; then
    echo "API call succeeded!"
    break
  elif [[ "$OUTPUT" == *"ThrottlingException"* ]] || [[ "$OUTPUT" == *"Timeout"* ]]; then
    # Calculate random jitter between 0 and current BASE_DELAY
    JITTER=$(awk -v min=0 -v max=$BASE_DELAY 'BEGIN{srand(); print min+rand()*(max-min)}')
    
    echo "Rate limited or timed out. Retrying in $JITTER seconds..."
    sleep $JITTER
    
    # Exponential increase for the next loop
    BASE_DELAY=$((BASE_DELAY * 2))
    RETRY_COUNT=$((RETRY_COUNT + 1))
  else
    echo "Critical Error encountered: $OUTPUT"
    exit 1
  fi
done

if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then
  echo "Failed after $MAX_RETRIES attempts."
  exit 1
fi
E

Error Medic Editorial

The Error Medic Editorial team consists of senior DevOps engineers, AWS Certified Solutions Architects, and Site Reliability Engineers dedicated to untangling complex cloud infrastructure bottlenecks.

Sources

Related Guides