Troubleshooting AWS ALB 502 Bad Gateway and 504 Gateway Timeout Errors
Comprehensive guide to fixing AWS ALB 502 Bad Gateway and 504 Gateway Timeout errors. Learn root causes, diagnostic steps, and actionable fixes for your targets
- 502 Bad Gateway usually means the target actively closed the connection or returned a malformed response.
- 504 Gateway Timeout means the target failed to respond before the ALB's idle timeout expired.
- Keep-Alive timeout mismatches between the ALB and the backend target are the #1 cause of intermittent 502s.
- Always check ALB access logs and target application logs (e.g., Nginx, Tomcat, Node.js) to pinpoint the exact failure point.
| Error Code | Root Cause | Fix Approach | Resolution Time |
|---|---|---|---|
| 502 Bad Gateway | Target closed connection prematurely (Keep-Alive mismatch) | Increase target's keep-alive timeout to be greater than ALB's idle timeout | Fast (< 15 mins) |
| 502 Bad Gateway | Target returned malformed HTTP response | Fix application code or web server configuration to return valid HTTP/1.1 headers | Medium (Requires code/config change) |
| 504 Gateway Timeout | Target processing took longer than ALB idle timeout | Optimize application performance or increase ALB idle timeout | Medium to Slow |
| 504 Gateway Timeout | Network ACL or Security Group blocking traffic from ALB to Target | Update SG/NACL rules to allow traffic on target port from ALB subnets | Fast (< 10 mins) |
Understanding ALB 502 and 504 Errors
When operating applications behind an AWS Application Load Balancer (ALB), encountering 502 Bad Gateway and 504 Gateway Timeout errors is a common rite of passage. While they look similar to an end user, they indicate fundamentally different interactions between the ALB and your backend targets (EC2 instances, ECS tasks, or Lambda functions).
The Difference: 502 vs. 504
- 502 Bad Gateway: The ALB successfully established a connection with the target, but the target either returned a malformed response, or, more commonly, closed the TCP connection before the ALB could send the request or read the response. The ALB perceives this as the target acting poorly.
- 504 Gateway Timeout: The ALB established a connection and sent the request, but the target failed to send a complete response before a configured timer expired. This is usually the ALB's
idle timeoutsetting. The ALB essentially gave up waiting.
Deep Dive: Fixing 502 Bad Gateway
The most notorious cause of intermittent 502 errors is a mismatch in the TCP Keep-Alive timeout settings between the ALB and the backend web server (like Nginx, Apache, Node.js, or Tomcat).
The Keep-Alive Race Condition
By default, an ALB has an idle timeout of 60 seconds. It uses persistent connections (keep-alives) to communicate with backend targets to improve performance.
If your backend web server has a keep-alive timeout shorter than the ALB's idle timeout (e.g., Nginx default is often 75s, but Node.js might be 5s), a race condition occurs:
- The ALB maintains an open idle connection to the target.
- The target's shorter keep-alive timer expires. The target decides to close the connection and sends a
FINpacket. - At the exact same millisecond, a new client request arrives at the ALB.
- The ALB routes this request down the connection it thinks is still open.
- The target receives a request on a connection it is in the process of closing. It responds with an
RST(Reset) packet. - The ALB receives the
RSTand serves a502 Bad Gatewayto the client.
The Fix: Aligning Timeouts
Rule of thumb: The keep-alive timeout of your backend target must be greater than the idle timeout of your ALB.
If your ALB idle timeout is 60 seconds:
- Nginx: Set
keepalive_timeout 65;innginx.conf. - Apache: Set
KeepAliveTimeout 65inhttpd.conf. - Node.js:
server.keepAliveTimeout = 65000; server.headersTimeout = 66000;
Deep Dive: Fixing 504 Gateway Timeout
A 504 error almost always means your backend is too slow, or the network is dropping packets silently.
Scenario 1: Slow Application Processing
If your application involves heavy database queries, calling external slow APIs, or complex computations, it might legitimately take longer than the default 60-second ALB idle timeout to return a response.
Diagnostic Steps:
- Check your application performance monitoring (APM) tools (Datadog, New Relic, AWS X-Ray).
- Look at the
target_processing_timemetric in CloudWatch for your ALB. If this value approaches your ALB's idle timeout before 504s occur, the application is the bottleneck.
The Fix:
- Short-term: Increase the ALB's idle timeout (up to 4000 seconds) in the EC2 Console -> Load Balancers -> Attributes.
- Long-term: Optimize your application code, add database indexes, or move long-running tasks to background queue workers (like SQS/Celery) instead of blocking the HTTP request.
Scenario 2: Silent Network Drops (Security Groups/NACLs)
If the ALB attempts to route traffic to a target, but a Security Group or Network ACL blocks the return traffic or drops the initial SYN packets without a rejection, the connection simply hangs until the ALB times out, resulting in a 504.
Diagnostic Steps:
- Verify the Target Group health checks. If health checks are failing with timeouts, network configuration is the likely culprit.
- Ensure the Target Security Group allows inbound traffic on the target port (e.g., 80, 8080) from the ALB's Security Group.
- Ensure the ephemeral ports (1024-65535) are allowed on the return path if using strict NACLs.
Leveraging ALB Access Logs
ALB Access Logs are your source of truth. Enable them to stream to an S3 bucket. The logs contain specific fields that pinpoint the failure:
elb_status_code: The status the ALB returned to the client (e.g., 502, 504).target_status_code: The status the target returned to the ALB. (If this is-, the target didn't return an HTTP response).request_processing_time,target_processing_time,response_processing_time: Iftarget_processing_timeis-1, the ALB couldn't reach the target or the connection closed unexpectedly.
By querying these logs using Amazon Athena, you can isolate which specific targets are failing and exactly when the connection is breaking down.
Frequently Asked Questions
-- Example Amazon Athena query to find 502 and 504 errors in ALB access logs
SELECT
time,
client_ip,
elb_status_code,
target_status_code,
target_processing_time,
request_url,
target_port_list
FROM alb_logs
WHERE elb_status_code IN (502, 504)
ORDER BY time DESC
LIMIT 50;Error Medic Editorial
The Error Medic Editorial team consists of senior DevOps engineers and Site Reliability Experts dedicated to demystifying complex cloud infrastructure issues.
Sources
- https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html
- https://aws.amazon.com/premiumsupport/knowledge-center/elb-alb-troubleshoot-502-errors/
- https://aws.amazon.com/premiumsupport/knowledge-center/elb-alb-troubleshoot-504-errors/
- https://serverfault.com/questions/918334/aws-alb-intermittent-502-bad-gateway-errors