Why am I getting a 504 Gateway Timeout only under heavy load in Azure App Service?

This is often caused by SNAT port exhaustion or thread pool starvation. When load spikes, your app might run out of available outbound ports or threads to process incoming requests, causing requests to queue and eventually hit the load balancer's timeout limit. Consider scaling out, implementing connection pooling, or attaching a NAT Gateway.

Can I increase the 230-second timeout limit in Azure App Service?

No. The 230-second idle timeout is a hard limit imposed by the Azure Load Balancer that sits in front of all Azure App Services. For operations longer than this, you must adopt an asynchronous architecture (like the Decoupled Invocation pattern or using Azure Durable Functions).

My API Management (APIM) returns a timeout after exactly 20 seconds, but my backend completes the request. Why?

APIM has a default `forward-request` timeout of 20 seconds. If your backend takes 25 seconds to process and return the response, APIM has already closed the connection to the client at the 20-second mark. You must adjust the ` ` policy for that specific endpoint.

What is the difference between a 502 Bad Gateway and a 504 Gateway Timeout in Azure?

A 504 Gateway Timeout means the gateway (like App Gateway or APIM) did not receive a timely response from the backend server. A 502 Bad Gateway typically means the gateway could not establish a connection to the backend at all (e.g., the backend service is down, misconfigured SSL, or DNS resolution failed).

How do I check if my Application Gateway is causing the timeouts?

Enable Diagnostic Logs for your Application Gateway and query the `ApplicationGatewayAccessLog` in Log Analytics. Look for the `serverStatus` column returning 504 and check the `timeTaken` column. If `timeTaken` matches your HTTP Setting's timeout threshold, the Application Gateway severed the connection before the backend finished.

Troubleshooting Azure API Timeout Errors: 504 Gateway Timeout and 502 Bad Gateway Fixes

Resolve Azure API timeout errors (504/502) quickly. Learn how to diagnose Application Gateway, API Management, and App Service timeouts with actionable fixes.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,514 words

Key Takeaways

Azure API timeouts typically manifest as HTTP 504 Gateway Timeout or HTTP 502 Bad Gateway errors across API Management (APIM), Application Gateway, or App Services.
Common root causes include backend processing delays exceeding the default APIM/App Gateway timeout limits, SNAT port exhaustion, or unoptimized database queries.
Quick fixes involve increasing the default timeout limits via Azure CLI/Portal, implementing asynchronous patterns for long-running operations, or scaling backend resources.

Azure API Timeout Fix Approaches Compared
Method	When to Use	Time	Risk
Increase API Management Timeout (`forward-request`)	When backend predictably takes longer than 20 seconds for specific endpoints.	5 mins	Low (if isolated), High (if applied globally, risks connection pooling exhaustion)
Scale Up/Out Backend (App Service/AKS)	When CPU/Memory metrics show resource exhaustion causing request queuing.	15 mins	Low (but incurs higher billing costs)
Implement Asynchronous Request-Reply Pattern (202 Accepted)	For long-running operations (>120s) like report generation or batch processing.	Days	Medium (requires client and backend architectural changes)
Resolve SNAT Port Exhaustion (NAT Gateway)	When outbound connections to databases or external APIs are timing out under load.	30 mins	Low

Understanding Azure API Timeout Errors

When working with Azure's ecosystem—whether routing through Azure API Management (APIM), Azure Application Gateway, Azure Front Door, or directly hitting an Azure App Service—one of the most frustrating interruptions is the dreaded API timeout. These typically surface to the client as either an HTTP 504 Gateway Timeout or an HTTP 502 Bad Gateway error.

The anatomy of an Azure timeout is directly tied to the infrastructure topology. Every hop in your Azure networking stack has its own idle timeout settings. For instance, Azure Load Balancer has a default idle timeout of 4 minutes. Azure App Service has a default request timeout of 230 seconds. Azure API Management (APIM) has a default forward-request timeout of 20 seconds. If your backend service takes 30 seconds to respond, it might successfully process the request, but APIM will drop the connection at the 20-second mark and return a 504 to the client.

Common Error Messages

From APIM: { "statusCode": 504, "message": "Gateway Timeout" }
From Application Gateway: 504 Gateway Time-out - The server didn't respond in time.
From App Service (Docker): Container [Name] didn't respond to HTTP pings on port: 8080, failing site start. See container logs for debugging.
From Application Routing / ARR: The specified CGI application encountered an error and the server terminated the process.

Step 1: Diagnose the Exact Bottleneck

Before modifying infrastructure, you must identify where the timeout is occurring. Is it the client dropping the connection? The API Gateway? Or the backend database locking up?

Using Azure Application Insights

If you have Application Insights enabled, navigate to the Performance or Failures blade. Look for the Dependency execution times. If your API is timing out, it's highly likely a downstream dependency (like an Azure SQL database query or a third-party REST API call) is dragging the response time down.

Query Log Analytics to find the exact request durations to pinpoint the tier causing the issue:

requests
| where resultCode == "504" or resultCode == "502"
| project timestamp, operation_Name, duration, resultCode, client_IP
| order by duration desc

Diagnosing SNAT Port Exhaustion

If your App Service is making numerous outbound calls (e.g., to an external API or database) without utilizing connection pooling, you may be hitting SNAT (Source Network Address Translation) port exhaustion. This prevents new outbound connections from being established, resulting in a timeout. Go to the App Service -> Diagnose and solve problems -> SNAT Port Exhaustion to verify if port allocation has reached its maximum threshold.

Step 2: Implement the Fix

Once you have identified the bottleneck, apply the corresponding fix. Be aware that increasing timeouts is often treating the symptom; optimizing backend performance treats the disease.

Fix A: Increase the APIM Timeout Policy

If your backend requires more than 20 seconds and it is a legitimate architectural requirement (for example, a legacy system that cannot be easily optimized), you can increase the timeout limit in the APIM policy using the <forward-request> element.

Navigate to APIM -> Select your API -> Design -> Inbound processing -> Code View:

<policies>
    <inbound>
        <base />
    </inbound>
    <backend>
        <!-- Increase timeout to 120 seconds -->
        <forward-request timeout="120" />
    </backend>
    <outbound>
        <base />
    </outbound>
</policies>

Warning: Do not set this globally. Apply it only to the specific operations that require it to prevent blocking the APIM thread pool and causing cascading failures across your other APIs.

Fix B: Application Gateway Request Timeout

If you are using Azure Application Gateway, the default request routing timeout is 20 seconds. If your backend VMs or App Services take longer to compute the response, you must update the HTTP settings associated with your routing rules.

Navigate to Application Gateway -> HTTP settings -> Select your setting -> update the Request timeout (seconds) field to a higher value (e.g., 120). You can also do this via the Azure CLI or Terraform for infrastructure-as-code deployments.

Fix C: Addressing App Service 230-Second Limit

Azure App Service has an unchangeable hard limit: the Azure Load Balancer will drop any connection that is idle for 230 seconds. If your request takes longer than 3.8 minutes, you cannot simply "increase the timeout." You must fundamentally change the architecture.

The Asynchronous Request-Reply Pattern: Instead of holding the HTTP connection open while processing a massive file or report, refactor your API to return an HTTP 202 Accepted immediately, along with a Location header pointing to a status endpoint.

Client sends POST /api/reports.
API queues a background job (using Azure Service Bus, RabbitMQ, or Azure Storage Queues) and immediately returns 202 Accepted with Location: /api/reports/status/123.
Worker (e.g., Azure Functions or a WebJob) processes the queue message independently.
Client polls /api/reports/status/123 every few seconds until it returns 200 OK with the final payload or a download link.

Fix D: Optimizing Backend Dependencies and Connection Pooling

If the timeout is caused by a slow database query or resource exhaustion, increasing the API timeout is merely a band-aid. Consider these optimizations:

Database Tuning: Check for missing database indexes or locking issues in Azure SQL.
Connection Pooling: Ensure your application uses singletons for HTTP Clients (HttpClient in .NET, requests.Session() in Python) to prevent socket starvation.
Asynchronous Code: Ensure you are using async/await throughout your entire application stack to prevent thread-blocking under high concurrent load.
Caching: Implement a caching layer using Azure Cache for Redis to serve frequently requested data in milliseconds rather than querying the primary database repeatedly.

Frequently Asked Questions

bash

# Check current Application Gateway HTTP Settings timeout
az network application-gateway http-settings show \
  --gateway-name MyGateway \
  --resource-group MyResourceGroup \
  --name MyHttpSettings \
  --query "requestTimeout"

# Increase Application Gateway timeout to 120 seconds
az network application-gateway http-settings update \
  --gateway-name MyGateway \
  --resource-group MyResourceGroup \
  --name MyHttpSettings \
  --request-timeout 120

# Query Log Analytics via CLI to find 504 Timeout occurrences
az monitor log-analytics query \
  --workspace-id <your-workspace-id> \
  --analytics-query "requests | where resultCode == '504' | summarize count() by operation_Name"

Error Medic Editorial

A dedicated team of Senior Site Reliability Engineers and DevOps practitioners sharing hard-learned lessons on cloud infrastructure, debugging, and system architecture.

Sources

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI

Azure API Timeout: Fixing 'The request timed out' and 408/504 Errors in Azure APIs

Fix Azure API timeout errors (408, 504, RequestTimeout) fast. Covers ARM, APIM, Function App, and SDK timeouts with real commands and config fixes.

Azure API Timeout: How to Diagnose and Fix 408/504 Timeout Errors

Fix Azure API timeout errors (408, 504, OperationTimedOut) by adjusting timeout settings, enabling retries, and optimizing long-running calls. Step-by-step guid