Troubleshooting AWS API Gateway Rate Limit (429) and Related 5xx/4xx Errors
Resolve AWS API Gateway 429 Too Many Requests, 503 Service Unavailable, and 504 Timeouts. Learn how to configure Usage Plans, adjust quotas, and fix VPC links.
- HTTP 429 (Too Many Requests) indicates you have hit account-level, stage-level, route-level, or Usage Plan rate limits.
- HTTP 504 (Gateway Timeout) means your backend integration took longer than the hard, unchangeable 29-second limit.
- HTTP 503 (Service Unavailable) usually points to VPC Link misconfigurations or failing Network Load Balancer target group health checks.
- A 'Missing Authentication Token' response is typically a disguised 404 Not Found caused by incorrect paths, HTTP methods, or un-deployed stages.
- Quick Fix: Check CloudWatch Execution Logs to pinpoint the error source, adjust Usage Plan throttling, or temporarily increase backend concurrency.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Increase Usage Plan Quotas | Hitting 429 errors on specific API keys | 5 mins | Low |
| Request Account Limit Increase | Hitting the global 10,000 RPS account limit | 1-2 days | Low |
| Decouple via SQS / EventBridge | Lambda functions timing out (504 errors) | Days/Weeks | Medium |
| Enable API Caching | High read-heavy traffic causing backend strain | 15 mins | Medium |
Understanding the Error: AWS API Gateway Rate Limits and Throttling
When building scalable cloud-native applications, Amazon API Gateway serves as the robust front door to your backend services. However, as traffic scales, developers frequently encounter HTTP 429 (Too Many Requests), HTTP 504 (Gateway Timeout), HTTP 503 (Service Unavailable), and HTTP 404 (Not Found) errors. These errors act as a protective layer, shielding your backend from traffic spikes, DDoS attacks, and systemic cascading failures.
In this comprehensive guide, we will dissect the root causes of AWS API Gateway rate limiting and timeout errors, explore the mechanics of AWS's token bucket algorithm, and provide actionable resolution paths to restore your service health.
The Token Bucket Algorithm and HTTP 429 Too Many Requests
API Gateway uses a token bucket algorithm to throttle requests. By default, AWS provisions an account-level quota of 10,000 requests per second (RPS) with a burst capacity of 5,000 requests across all APIs within a specific AWS Region.
When a client exceeds this rate, API Gateway intercepts the request before it even reaches your integration (like AWS Lambda or an HTTP endpoint) and returns an HTTP 429 Too Many Requests error. The standard response body looks like this:
{"message": "Too Many Requests"}
The Four Tiers of API Gateway Throttling
To troubleshoot a 429 error, you must identify which layer is enforcing the limit. Throttling can occur at four distinct levels, evaluated in the following order:
- Account-Level Limits: The regional hard limits applied to your entire AWS account (default 10k RPS). If one runaway API consumes all 10,000 RPS, your other APIs in that region will also start throwing 429s.
- Stage-Level Limits: Limits defined on a specific API deployment stage (e.g.,
prodordev). - Method-Level (Route) Limits: Granular limits applied to a specific route, such as
GET /users. - Usage Plan Limits: Limits enforced on specific API keys distributed to your clients.
Diagnosing the 429 Source
To determine the source of the throttling, check your CloudWatch Logs.
If you see:
Plan ID xxxxxxxx has exceeded the allocated rate limit
The block is happening at the Usage Plan level.
If you see:
Method capacity exceeded
The block is happening at the Stage or Method level.
Resolution Steps for Throttling:
- Usage Plan Limits: Navigate to the API Gateway Console -> Usage Plans -> Select the plan -> Adjust the Rate (requests per second) and Burst limits.
- Account Limits: If you are hitting the regional 10,000 RPS limit, you must open a support ticket with AWS via the Service Quotas console to request a limit increase.
- Client-Side Remediation: Implement exponential backoff and jitter in your client SDKs. AWS SDKs do this by default, but custom HTTP clients need explicit retry logic.
Decoding HTTP 504: AWS API Gateway Timeout
An HTTP 504 Gateway Timeout occurs when API Gateway fails to receive a response from the backend integration within the maximum integration timeout window.
The Hard Limit: API Gateway has a strict, unchangeable integration timeout limit of 29 seconds for all REST and HTTP APIs (WebSocket APIs have a different idle timeout). If your AWS Lambda function, ECS container, or on-premises server takes 29.01 seconds to process the request, API Gateway severs the connection and returns a 504 to the client, even if the backend eventually completes the task successfully.
Common Root Causes of 504s
- Lambda Cold Starts: If your API is backed by a Java or .NET Lambda function inside a VPC, cold starts can easily exceed 10-15 seconds. Under heavy load, concurrent cold starts might breach the 29-second limit.
- Unoptimized Database Queries: The backend might be executing table scans or waiting on database locks.
- Third-Party API Latency: Your backend might be waiting on a slow external webhook or payment gateway.
Resolution Steps for Timeouts
Since you cannot increase the 29-second limit, you must decouple the architecture:
- Implement Asynchronous Patterns: Instead of waiting for a long-running process, configure API Gateway to place the payload directly into an Amazon SQS queue or start an AWS Step Functions execution. Return an
HTTP 202 Acceptedto the client immediately with a job ID, and have the client poll for completion. - Provisioned Concurrency: If cold starts are the culprit, enable Lambda Provisioned Concurrency to keep execution environments warm.
- Database Optimization: Analyze Amazon RDS Performance Insights or DynamoDB slow query logs to add necessary indexes and reduce execution time.
HTTP 503 Service Unavailable & 502 Bad Gateway
While timeouts are straightforward, 503 Service Unavailable and 502 Bad Gateway errors often point to infrastructure networking issues or malformed backend responses.
503 Service Unavailable: The VPC Link Conundrum
If you are routing traffic to private resources (like ECS Fargate tasks or internal ALBs) using an API Gateway VPC Link, a 503 error almost always indicates a networking misconfiguration:
- Target Group Health Checks: The Network Load Balancer (NLB) attached to your VPC Link has marked the backend targets as unhealthy.
- Security Groups: The backend resource's Security Group is not allowing inbound traffic from the NLB's private IP addresses.
502 Bad Gateway: The Lambda Proxy Mismatch
When using Lambda Proxy Integration, your Lambda function MUST return a response object in a very specific JSON format. If your function returns a raw string or an improperly formatted object, API Gateway cannot parse it and throws a 502.
Correct Proxy Response Format:
{
"isBase64Encoded": false,
"statusCode": 200,
"headers": { "Content-Type": "application/json" },
"body": "{\"message\": \"Success\"}"
}
The Bizarre 403 / 404: API Gateway Not Found & Missing Authentication Token
One of the most notoriously confusing errors in AWS is receiving a 403 Forbidden with the message:
{"message": "Missing Authentication Token"}
Despite the message, this rarely has anything to do with authentication (unless you are actually missing an IAM sigv4 signature on an IAM-protected route).
The Real Cause: This error typically means 404 Not Found. API Gateway throws this message when a client requests a path or HTTP method that does not exist in the API definition. AWS returns a 403 instead of a 404 to prevent enumeration attacks, obscuring whether the route actually exists.
Resolving "Not Found" Errors
- Check the HTTP Method: Are you sending a
POSTrequest to an endpoint that only acceptsGET? - Check the Stage: Did you deploy your changes? In API Gateway, saving a resource does not make it live. You must explicitly click Deploy API and select a stage.
- Custom Domain Path Mapping: If you are using a Custom Domain Name, verify that the API mapping points the base path to the correct API and stage.
- Trailing Slashes: API Gateway treats
/usersand/users/as two completely separate resources. If you define/users, requesting/users/will result in a Missing Authentication Token error.
Advanced Diagnostic Commands
To effectively troubleshoot, you need to rely heavily on AWS CLI tools and CloudWatch.
1. Enable Execution Logging
Execution logging is distinct from Access logging. Execution logs record the internal processing steps of API Gateway, including transformation, validation, and integration request/response details.
Ensure you set the log level to INFO or ERROR in the Stage settings.
2. Querying CloudWatch Logs with Log Insights Use this CloudWatch Logs Insights query to find requests that exceeded the integration timeout:
fields @timestamp, @message
| filter @message like /Execution failed due to a timeout error/
| sort @timestamp desc
| limit 20
3. AWS X-Ray Tracing Enable AWS X-Ray active tracing on your API stage. X-Ray provides a visual service map showing exactly where latency is introduced—whether in API Gateway, the Lambda function, or downstream AWS services like DynamoDB or S3.
Conclusion
Mastering AWS API Gateway troubleshooting requires understanding the boundaries of the service. Remember that 429s are designed to protect your system, 504s represent immovable architectural constraints, and 503/404s require careful inspection of your networking and deployment lifecycles. By applying usage plans correctly, shifting long-running tasks to asynchronous patterns, and leveraging detailed logging, you can maintain a highly available and resilient API tier.
Frequently Asked Questions
# Check current stage throttling settings
aws apigateway get-stage \
--rest-api-id a1b2c3d4e5 \
--stage-name prod | jq '.methodSettings'
# Update stage-level throttling (Rate Limit: 5000, Burst Limit: 2000)
aws apigateway update-stage \
--rest-api-id a1b2c3d4e5 \
--stage-name prod \
--patch-operations op=replace,path=/*/*/throttling/rateLimit,value=5000 \
op=replace,path=/*/*/throttling/burstLimit,value=2000
# Tail API Gateway execution logs to spot 5xx or 429 errors
aws logs tail /aws/api-gateway/a1b2c3d4e5/prod --follow --format shortError Medic Editorial
A collective of senior SREs, DevOps engineers, and cloud architects dedicated to demystifying modern infrastructure and accelerating incident resolution.
Sources
- https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html
- https://aws.amazon.com/premiumsupport/knowledge-center/api-gateway-504-errors/
- https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-known-issues.html
- https://stackoverflow.com/questions/43708017/aws-api-gateway-missing-authentication-token