Why am I getting 'Rate exceeded' on STS AssumeRole?

AWS STS has strict account-level API limits (typically around 400 requests per second globally per account). If you have hundreds of Lambda functions spinning up concurrently and assuming roles, you will hit this limit. To fix this, cache the temporary STS credentials in memory until they are close to expiration, rather than calling AssumeRole on every invocation.

Does the AWS SDK handle rate limits automatically?

Yes, but only to a limited extent by default. The default 'standard' mode usually performs up to 3 retries with basic exponential backoff. For production workloads susceptible to throttling, you should explicitly configure the SDK to use 'adaptive' mode, which rate-limits requests locally based on the throttling responses received from AWS.

How do I fix the 29-second API Gateway timeout?

The 29-second timeout in API Gateway is a hard limit and cannot be increased by AWS Support. If your backend process takes longer, you must refactor your API to be asynchronous. Use an API Gateway service integration to push the payload directly to an SQS queue, return an HTTP 202 Accepted to the client, and process the SQS message with a backend worker.

What is the difference between ThrottlingException and ProvisionedThroughputExceeded?

ThrottlingException generally applies to AWS Control Plane APIs (e.g., creating an EC2 instance, describing a VPC). ProvisionedThroughputExceeded is specific to AWS Data Plane services like DynamoDB or Kinesis, indicating that you have consumed all the Read/Write Capacity Units (RCUs/WCUs) you specifically provisioned for that resource.

Why do my private subnets get API timeouts, but public subnets do not?

Instances in private subnets must route internet traffic (including calls to public AWS APIs like S3) through a NAT Gateway. If you have high concurrency, the NAT Gateway can exhaust its SNAT ports or drop idle connections, leading to timeouts. Implementing VPC Gateway Endpoints (for S3/DynamoDB) or Interface Endpoints (for other services) routes traffic internally and bypasses the NAT Gateway entirely.

Fixing AWS API Rate Limit (ThrottlingException) and Timeout Errors

AWS Rate Limit & Timeout Fix Approaches Compared
Method	When to Use	Time to Implement	Risk / Cost
SDK Adaptive Retries & Jitter	Immediate fix for burst 429s and 'Rate exceeded' exceptions.	Minutes (Env Var / SDK Config)	Low / Free
Service Quota Increase	Persistent throttling under normal expected traffic load.	Days (AWS Support SLA)	Low / Free
Implement Caching (ElastiCache)	Read-heavy workloads repeatedly polling AWS APIs (e.g., DescribeInstances).	Weeks (Code changes)	Medium / Infrastructure Cost
Decoupling via SQS / EventBridge	Write-heavy batch jobs exceeding Transaction Per Second (TPS) limits.	Months (Re-architecture)	High / Engineering Effort

Understanding AWS API Rate Limits and Timeouts

When operating at scale on Amazon Web Services (AWS), you will inevitably encounter API rate limits (throttling) and timeouts. AWS uses a token bucket algorithm to ensure fair usage and protect the stability of their control planes. When your AWS accounts or IAM roles consume API tokens faster than they are replenished, AWS responds with throttling errors.

Simultaneously, network latency, misconfigured VPC routing, or strict integration timeout limits (like API Gateway's 29-second hard limit) can cause persistent timeout errors. As a DevOps engineer or Site Reliability Engineer (SRE), you must distinguish between control plane limits, data plane limits, and network-level timeouts to implement the correct mitigations.

Identifying the Exact Error Messages

Before fixing the issue, identify the exact error signature in your application logs or CloudWatch streams:

1. The 429 Too Many Requests (Throttling)

botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the DescribeInstances operation: Rate exceeded
com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException: The level of configured provisioned throughput for the table was exceeded.
HTTP 429 Too Many Requests - Rate Limit Exceeded

2. The 504 Gateway Timeout or Client Timeout

botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://ec2.us-east-1.amazonaws.com/"
HTTP 504 Gateway Timeout - The server didn't respond in time.
Task timed out after 3.00 seconds (AWS Lambda timeout communicating with AWS APIs)

Step 1: Diagnosing the Bottleneck

To effectively troubleshoot, you need visibility. Blindly increasing retries can cause a "thundering herd" problem, making the outage worse.

Analyzing CloudTrail for Throttling

AWS CloudTrail logs every API call. When throttling occurs, the errorCode field is populated with ThrottlingException or Client.RequestLimitExceeded. You can query your CloudTrail logs using Amazon Athena to find the offending IAM principal or IP address:

SELECT useridentity.arn, eventsource, eventname, count(*)
FROM cloudtrail_logs
WHERE errorcode = 'ThrottlingException' 
  AND eventtime > '2023-10-01T00:00:00Z'
GROUP BY useridentity.arn, eventsource, eventname
ORDER BY count(*) DESC;

This query reveals which microservice or Lambda function is aggressively polling the AWS API.

Checking Metrics in CloudWatch

Navigate to CloudWatch -> Metrics -> Usage -> By AWS Resource. Look for CallCount and ThrottleCount. For services like API Gateway, monitor the 5XXError and IntegrationLatency metrics to diagnose timeouts.

Step 2: Fixing AWS API Rate Limits (429 Throttling)

Solution A: Exponential Backoff and Jitter

If you retry a failed AWS API request immediately, you are likely to be throttled again. The industry standard is Exponential Backoff with Jitter. Backoff increases the wait time between retries (e.g., 1s, 2s, 4s, 8s), while Jitter adds randomness so multiple threads don't retry at the exact same millisecond.

AWS SDKs have built-in retry handlers. However, the default standard mode only retries up to 3 times. You should enable adaptive retry mode, which dynamically limits the rate of requests to stay within the AWS service's acceptable rate.

Enable Adaptive Mode via Environment Variables (Works across all AWS SDKs & CLI):

export AWS_RETRY_MODE=adaptive
export AWS_MAX_ATTEMPTS=10

Solution B: Caching Aggressive Polling

A common anti-pattern is calling DescribeInstances or GetSecretValue on every single web request. If an AWS Lambda function checks AWS Secrets Manager on every invocation during a traffic spike, you will hit rate limits instantly.

Fix: Implement local memory caching or use the AWS Secrets Manager Data API caching library.
Fix: For EC2/ECS metadata, cache the results using Redis/Memcached rather than querying the AWS control plane repeatedly.

Solution C: Requesting a Service Quota Increase

If your architecture is optimized but your baseline traffic genuinely requires higher API throughput, request a quota increase.

Go to the AWS Service Quotas console.
Search for the throttled service (e.g., Amazon Elastic Compute Cloud (Amazon EC2)).
Find the specific API (e.g., Describe API requests).
Click Request quota increase and provide your desired TPS (Transactions Per Second) and technical justification.

Step 3: Fixing AWS API Timeouts (504 Errors)

API timeouts require a different approach. A timeout means the request was sent, but the client gave up waiting for the server, or an intermediary proxy closed the connection.

Solution A: Resolving NAT Gateway Port Exhaustion

If your Lambda functions or EC2 instances in a private subnet are experiencing ConnectTimeoutError when calling public AWS APIs (like S3 or DynamoDB), they are likely routing through a NAT Gateway. NAT Gateways have a limit of 55,000 concurrent connections to a single destination.

Fix: Deploy VPC Endpoints (AWS PrivateLink). This routes traffic to AWS services internally via the AWS backbone, bypassing the NAT Gateway entirely. This drastically reduces latency and eliminates NAT port exhaustion timeouts.

Solution B: Bypassing the API Gateway 29-Second Hard Limit

Amazon API Gateway has a strict, unchangeable integration timeout of 29 seconds. If your backend (Lambda or ALB) takes 30 seconds to process a request, API Gateway will return a 504 Gateway Timeout, even if the backend eventually succeeds.

Fix 1 (Asynchronous): Change the architecture to asynchronous processing. API Gateway should place the request into an SQS queue and immediately return a 202 Accepted. The client can then poll a status endpoint or receive a WebSocket/SNS notification when the job is done.
Fix 2 (Step Functions): Use AWS Step Functions to orchestrate long-running workflows, utilizing the API Gateway integration with Step Functions to return immediate execution ARNs.

Solution C: Tuning TCP Keep-Alives and Socket Timeouts

Sometimes, idle connections are dropped silently by load balancers or firewalls. Ensure your HTTP client has TCP Keep-Alives enabled and that the SDK socket timeout is slightly higher than the expected maximum response time.

For Node.js AWS SDK v3:

import { NodeHttpHandler } from "@smithy/node-http-handler";
import { S3Client } from "@aws-sdk/client-s3";

const client = new S3Client({
  requestHandler: new NodeHttpHandler({
    connectionTimeout: 5000, // 5 seconds
    socketTimeout: 30000, // 30 seconds
  }),
});

Conclusion

Mastering AWS API interactions requires shifting from synchronous, aggressive scripting to resilient, asynchronous, and fault-tolerant architectures. By leveraging adaptive retries, caching control-plane queries, utilizing VPC endpoints, and respecting hard timeouts, your infrastructure will remain stable even under immense scale and load.