Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM
Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI
- Azure API Management enforces a hard default 30-second backend timeout; any backend response exceeding this returns HTTP 504 with body '{"statusCode":504,"message":"Backend service timeout"}' — override with the timeout attribute on <set-backend-service> (max 240s) or switch to an async LRO polling pattern
- Azure Functions on the Consumption plan caps at 10 minutes regardless of host.json settings; operations that cannot complete within that window must move to Premium/Dedicated plan or be redesigned with Durable Functions
- Network-layer timeouts (Azure Load Balancer 4-minute idle TCP, Application Gateway 20-second default) are independent of APIM and require separate configuration changes via the Azure portal or CLI
- Quick fix for APIM: add timeout="120" to <set-backend-service> in the inbound policy; quick fix for Functions: set functionTimeout in host.json; quick fix for App Gateway: az network application-gateway http-settings update --timeout 120
| Method | When to Use | Time to Implement | Risk |
|---|---|---|---|
| Increase APIM backend timeout policy | Backend genuinely needs >30s and you control APIM policy | 5 minutes | Low — scoped to specific operation or API |
| Async LRO polling pattern in APIM | Operations >240s; must offload to background job | 1–4 hours | Medium — requires client-side polling logic changes |
| Optimize backend response time | Backend slow due to cold starts, N+1 queries, or heavy compute | Hours to days | Low — improves reliability across all consumers |
| Application Gateway request timeout tuning | 504/502 occurs at the WAF/AGW layer before traffic reaches APIM | 10 minutes | Low — change requestTimeout in backend HTTP settings |
| Azure Function timeout + plan upgrade | Function hitting 5-min Consumption cap or 10-min hard ceiling | 15–30 minutes | Low for config; Medium for plan migration (cost increase) |
| APIM retry policy with exponential backoff | Transient timeouts from cold starts, brief backend spikes | 30 minutes | Low — defensive pattern, does not mask persistent issues |
Understanding Azure API Timeout Errors
Azure API timeouts surface at multiple layers of the request pipeline. Correct diagnosis requires identifying where the clock ran out before changing any configuration. The exact error messages differ by layer:
- Azure API Management:
HTTP 504 Gateway Timeout, body:{"statusCode": 504, "message": "Backend service timeout"} - Azure API Management (forward-request):
Microsoft.Azure.ApiManagement.Gateway.BackendRequestException: The request to the backend service timed out. - Azure Functions (Consumption plan):
Microsoft.Azure.WebJobs.Host.FunctionTimeoutException: Timeout value of 00:05:00 exceeded by function 'FunctionName' - ARM REST API:
{"code": "OperationTimeout", "message": "The operation 'Microsoft.Compute/virtualMachines/write' timed out"}— or a 202 Accepted that never transitions to Succeeded - Azure Load Balancer: Silent TCP RST after 4 minutes of idle — no HTTP error code, connection simply drops
- Application Gateway / WAF:
HTTP 502 Bad GatewayorHTTP 504 Gateway Timeoutwithx-appgw-trace-idheader present
Step 1: Identify Which Layer Is Timing Out
Always capture full HTTP response headers before changing configuration.
1a. Inspect the Via and x-ms-request-id headers.
If the response contains Via: 1.1 apim-gateway-xxxxx (Azure API Management), the request reached APIM and the timeout occurred between APIM and your backend. If no Via header is present and you see a 504, the timeout happened upstream (Application Gateway or Load Balancer).
1b. Query APIM gateway logs in Log Analytics.
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where ResponseCode == 504 or DurationMs > 28000
| project TimeGenerated, OperationName, DurationMs, BackendResponseCode, LastErrorReason, Url
| order by TimeGenerated desc
| take 20
A DurationMs clustering near 30000 confirms the default 30-second APIM backend timeout is the culprit. A DurationMs near 20000 with no backend response code points to Application Gateway.
1c. Check Application Insights for end-to-end traces.
requests
| where timestamp > ago(1h)
| where resultCode == "504" or duration > 25000
| project timestamp, name, url, duration, resultCode, cloud_RoleName
| order by timestamp desc
Step 2: Fix Azure API Management Timeout
The default backend timeout is 30 seconds and applies to all operations unless overridden. Fix at the operation, API, or global scope using the timeout attribute on <set-backend-service>.
Option A — Increase timeout (synchronous operations up to 240s):
<policies>
<inbound>
<base />
<set-backend-service
base-url="https://your-backend.example.com"
timeout="120" />
</inbound>
<backend>
<base />
</backend>
<outbound><base /></outbound>
<on-error><base /></on-error>
</policies>
The timeout attribute is in seconds (integer). The platform maximum is 240 seconds. If you need longer, you must use an async pattern.
Option B — Async LRO pattern for operations over 4 minutes:
Kick off work asynchronously and return a 202 with a polling URL:
<inbound>
<base />
<send-request mode="new" response-variable-name="jobResponse" timeout="10">
<set-url>https://your-backend.example.com/api/jobs</set-url>
<set-method>POST</set-method>
<set-body>@(context.Request.Body.As<string>(preserveContent: true))</set-body>
</send-request>
<return-response>
<set-status code="202" reason="Accepted" />
<set-header name="Location" exists-action="override">
<value>@(((IResponse)context.Variables["jobResponse"]).Headers.GetValueOrDefault("Location", ""))</value>
</set-header>
<set-header name="Retry-After" exists-action="override">
<value>5</value>
</set-header>
</return-response>
</inbound>
Clients poll the Location URL every Retry-After seconds until the status transitions to Succeeded or Failed.
Step 3: Fix Azure Function Timeout
Timeout limits are hard-enforced by plan tier:
| Plan | Default Timeout | Maximum Timeout |
|---|---|---|
| Consumption | 5 minutes | 10 minutes |
| Flex Consumption | 30 minutes | Unlimited |
| Premium | 30 minutes | Unlimited |
| Dedicated (App Service) | 30 minutes | Unlimited |
Update host.json to extend timeout up to the plan ceiling:
{
"version": "2.0",
"functionTimeout": "00:10:00"
}
If your function requires more than 10 minutes, upgrade the plan:
# Upgrade to Premium EP1
az functionapp update \
--name <function-app-name> \
--resource-group <rg> \
--plan-name <premium-plan-name>
For complex multi-step workflows, refactor to Durable Functions using the fan-out/fan-in pattern — this removes timeout concerns entirely because the orchestrator checkpoints state between activities.
Step 4: Fix Application Gateway Timeout
Application Gateway WAF v2 defaults to 20 seconds for the backend request timeout. Standard v2 defaults to 30 seconds. Update backend HTTP settings:
az network application-gateway http-settings update \
--gateway-name <agw-name> \
--resource-group <rg> \
--name <backend-http-settings-name> \
--timeout 120
# Verify the change
az network application-gateway http-settings show \
--gateway-name <agw-name> \
--resource-group <rg> \
--name <backend-http-settings-name> \
--query requestTimeout
Step 5: Handle ARM API Long-Running Operations
ARM returns HTTP 202 Accepted for operations that run longer than a few seconds. The response includes a polling URL:
HTTP/1.1 202 Accepted
Azure-AsyncOperation: https://management.azure.com/subscriptions/.../operations/abc123?api-version=2024-01-01
Retry-After: 30
Your client must poll Azure-AsyncOperation until status is Succeeded or Failed. Azure SDKs handle this automatically. For raw HTTP clients:
import requests, time
def poll_arm_lro(token, operation_url, retry_after=30, max_polls=40):
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
for _ in range(max_polls):
resp = requests.get(operation_url, headers=headers, timeout=10)
resp.raise_for_status()
body = resp.json()
status = body.get("status", "")
if status == "Succeeded":
return body
if status in ("Failed", "Canceled"):
err = body.get("error", {})
raise RuntimeError(f"ARM LRO failed [{err.get('code')}]: {err.get('message')}")
time.sleep(retry_after)
raise TimeoutError(f"ARM operation did not complete after {max_polls} polls")
Step 6: Add a Retry Policy as a Safety Net
Transient 504 errors from backend cold starts or brief spikes should be retried before surfacing to callers. Add this APIM retry policy around the <forward-request> element:
<backend>
<retry condition="@(context.Response.StatusCode == 504 || context.Response.StatusCode == 502)"
count="3"
interval="2"
max-interval="10"
delta="2"
first-fast-retry="false">
<forward-request timeout="60" />
</retry>
</backend>
This retries up to 3 times with exponential backoff (2s, 4s, 8s) without masking genuine persistent timeouts.
Verification
After applying fixes, run a targeted latency test:
curl -v -w "\nDNS: %{time_namelookup}s | Connect: %{time_connect}s | TTFB: %{time_starttransfer}s | Total: %{time_total}s\n" \
--max-time 150 \
-H "Ocp-Apim-Subscription-Key: <your-key>" \
https://<your-apim>.azure-api.net/<api-path>
Then confirm in Application Insights that p95 latency is below your new timeout threshold:
requests
| where timestamp > ago(30min)
| where url contains "<api-path>"
| summarize p50=percentile(duration,50), p95=percentile(duration,95), p99=percentile(duration,99)
by bin(timestamp, 5m)
| render timechart
Frequently Asked Questions
#!/usr/bin/env bash
# Azure API Timeout Diagnostic Script
# Usage: ./diagnose-api-timeout.sh <resource-group> [<apim-name>] [<function-app-name>] [<agw-name>]
RG="${1:?Usage: $0 <resource-group> [apim-name] [function-app-name] [agw-name]}"
APIM_NAME="${2:-}"
FUNC_NAME="${3:-}"
AGW_NAME="${4:-}"
echo "================================================================"
echo " Azure API Timeout Diagnostics — RG: $RG"
echo " $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "================================================================"
# --- 1. APIM Gateway Logs (requires Log Analytics workspace) ---
if [ -n "$APIM_NAME" ]; then
echo ""
echo "[1] APIM Gateway Logs — last 1h, DurationMs > 25000 or HTTP 504"
WORKSPACE_ID=$(az monitor log-analytics workspace list \
-g "$RG" --query '[0].customerId' -o tsv 2>/dev/null)
if [ -n "$WORKSPACE_ID" ]; then
az monitor log-analytics query \
--workspace "$WORKSPACE_ID" \
--analytics-query "
ApiManagementGatewayLogs
| where TimeGenerated > ago(1h)
| where ResponseCode == 504 or DurationMs > 25000
| project TimeGenerated, OperationName, DurationMs, BackendResponseCode, LastErrorReason
| order by TimeGenerated desc
| take 20" \
--timespan "PT1H" -o table 2>/dev/null || echo "[WARN] Query failed — check workspace permissions"
else
echo "[WARN] No Log Analytics workspace found in RG $RG"
fi
echo ""
echo "[2] APIM APIs in service $APIM_NAME"
az apim api list --resource-group "$RG" --service-name "$APIM_NAME" \
--query '[].{API:name, Path:path, Protocols:protocols}' -o table 2>/dev/null \
|| echo "[WARN] APIM $APIM_NAME not found or no access"
fi
# --- 2. Azure Function timeout config ---
if [ -n "$FUNC_NAME" ]; then
echo ""
echo "[3] Function App Plan and Timeout — $FUNC_NAME"
az functionapp show \
--name "$FUNC_NAME" --resource-group "$RG" \
--query '{Name:name, Kind:kind, State:state, Plan:appServicePlanId}' \
-o json 2>/dev/null || echo "[WARN] Function app $FUNC_NAME not found"
echo ""
echo "[4] Function App Settings (FUNCTIONS_WORKER_RUNTIME, timeout)"
az functionapp config appsettings list \
--name "$FUNC_NAME" --resource-group "$RG" \
--query "[?name=='FUNCTIONS_WORKER_RUNTIME' || name=='AzureWebJobsStorage' || name=='WEBSITE_RUN_FROM_PACKAGE'].{Key:name,Value:value}" \
-o table 2>/dev/null
fi
# --- 3. Application Gateway timeout ---
if [ -n "$AGW_NAME" ]; then
echo ""
echo "[5] Application Gateway Backend HTTP Settings — $AGW_NAME"
az network application-gateway http-settings list \
--gateway-name "$AGW_NAME" --resource-group "$RG" \
--query '[].{Name:name, TimeoutSec:requestTimeout, Port:port, Protocol:protocol}' \
-o table 2>/dev/null || echo "[WARN] AGW $AGW_NAME not found"
else
echo ""
echo "[5] Discovering Application Gateways in RG $RG"
az network application-gateway list -g "$RG" \
--query '[].{Name:name, State:operationalState, SKU:sku.name}' \
-o table 2>/dev/null
fi
# --- 4. Recent ARM operation failures ---
echo ""
echo "[6] Recent ARM Failed Operations — last 1h"
START=$(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)
az monitor activity-log list \
--resource-group "$RG" \
--start-time "$START" \
--status Failed \
--query '[].{Time:eventTimestamp, Operation:operationName.value, Status:status.value}' \
-o table 2>/dev/null | head -25
# --- 5. Live latency probe ---
if [ -n "$APIM_NAME" ]; then
echo ""
echo "[7] Live Latency Probe — APIM gateway health endpoint"
GW_URL=$(az apim show -g "$RG" -n "$APIM_NAME" --query 'gatewayUrl' -o tsv 2>/dev/null)
if [ -n "$GW_URL" ]; then
echo "Endpoint: $GW_URL"
curl -s -o /dev/null \
-w "HTTP %{http_code} | DNS %.3f s | TCP %.3f s | TTFB %.3f s | Total %.3f s\n" \
--max-time 35 "${GW_URL}/status-0123456789abcdef" 2>/dev/null \
|| echo "[INFO] Probe endpoint not reachable — test a real API operation path"
fi
fi
echo ""
echo "Diagnostics complete."
echo "Next steps: review DurationMs values and plan/tier limits above."
echo "Docs: https://aka.ms/apim-timeout-policy | https://aka.ms/functions-timeout"Error Medic Editorial
The Error Medic Editorial team consists of senior DevOps, SRE, and cloud engineers with hands-on experience running large-scale Azure, AWS, and GCP production workloads. Every guide is based on real incident postmortems, official vendor documentation, and community-verified solutions — no filler, no fluff.
Sources
- https://learn.microsoft.com/en-us/azure/api-management/set-backend-service-policy
- https://learn.microsoft.com/en-us/azure/api-management/advanced-policies#Retry
- https://learn.microsoft.com/en-us/azure/azure-functions/functions-host-json#functiontimeout
- https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/async-operations
- https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-troubleshooting-502
- https://stackoverflow.com/questions/54225734/azure-api-management-backend-service-timeout
- https://github.com/Azure/azure-functions-host/issues/4536