Error Medic

Azure API Timeout: Fixing 'The request timed out' and 408/504 Errors in Azure APIs

Fix Azure API timeout errors (408, 504, RequestTimeout) fast. Covers ARM, APIM, Function App, and SDK timeouts with real commands and config fixes.

Last updated:
Last verified:
2,291 words
Key Takeaways
  • Azure API timeouts most commonly stem from four root causes: default 30-second ARM operation limits, Azure API Management (APIM) policy timeout caps (240s default), Azure Function host.json execution timeout mismatches, and SDK/HttpClient default timeouts not matching backend SLAs.
  • Long-running operations should use Azure's async polling pattern (202 Accepted + Location header + Retry-After) rather than blocking synchronous calls—synchronous calls hitting the ARM 60-second gateway limit will always result in a 504 Gateway Timeout.
  • Quick fix summary: increase APIM backend timeout via policy XML, raise Function timeout in host.json (max 10 min Consumption, unlimited Premium/Dedicated), set explicit HttpClient.Timeout in SDK code, and enable retry policies with exponential back-off using Polly or Azure SDK built-in retries.
Fix Approaches Compared
MethodWhen to UseTime to ImplementRisk
Increase APIM backend timeout policyAPI call routed through Azure API Management exceeds default 240s5–15 minLow — scoped to policy XML
Raise Azure Function timeout in host.jsonFunction execution exceeds default 5 min (Consumption) or 30 min (Dedicated)2–5 minLow — requires redeploy
Set explicit HttpClient / SDK timeoutClient-side SDK times out before server responds; default 100s HttpClient10–20 minLow — code change + test
Switch to async LRO pattern (202 + polling)ARM operation or long compute exceeds 60s synchronous gateway limit1–4 hoursMedium — architectural change
Enable Polly retry with exponential back-offTransient 408/503/504 errors under load or during Azure maintenance30–60 minLow — additive policy
Scale up / out App Service PlanTimeout caused by CPU/memory saturation on compute tier5 minLow — cost increase
Move to Premium/Dedicated Function planConsumption plan cold-start + 10 min hard limit is insufficient30 minLow — cost increase

Understanding Azure API Timeout Errors

Azure surfaces timeout failures across several layers, each with distinct error messages and HTTP status codes:

  • 408 Request Timeout — the client or an upstream proxy gave up waiting before the server sent a full response.
  • 504 Gateway Timeout — Azure's front-end gateway (ARM, APIM, Application Gateway, or Front Door) forwarded your request but the backend did not reply within the gateway's limit.
  • CloudException: The request timed out. — thrown by the Azure .NET SDK when HttpClient.Timeout (default 100 seconds) elapses.
  • azure.core.exceptions.ServiceResponseError: ('Connection aborted.', timeout('timed out')) — Python SDK equivalent.
  • Error: connect ETIMEDOUT / Error: socket hang up — Node.js @azure/arm-* SDK or axios-based calls.
  • DeploymentFailed: The resource operation completed with terminal provisioning state 'Failed'. Error: Gateway Timeout — ARM deployment calling a slow custom script extension or external endpoint.

The Azure Timeout Stack

Requests traverse multiple timeout boundaries before reaching your backend:

Client SDK timeout (default: 100 s, HttpClient)
  └─ Azure API Management timeout (default: 240 s, configurable)
      └─ Azure Application Gateway / Front Door (default: 60–600 s)
          └─ ARM gateway synchronous limit (60 s hard)
              └─ Your backend / Function App / Container

A timeout fires at whichever boundary is hit first. Increasing only the APIM policy helps nothing if the client SDK times out at 30 seconds.


Step 1: Identify Which Layer Is Timing Out

Check the HTTP status code and response headers.

An APIM-generated timeout includes x-ms-request-id and an APIM policy error body:

{
  "statusCode": 504,
  "message": "The backend service did not respond in time."
}

An ARM gateway timeout looks like:

{
  "error": {
    "code": "GatewayTimeout",
    "message": "The gateway did not receive a response from the backend service within the time period."
  }
}

A client-side SDK timeout (C#):

Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'RequestTimeout'
  ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.

Use curl -v with --max-time to isolate the layer:

# Test direct backend (bypassing APIM)
curl -v --max-time 120 \
  -H "Authorization: Bearer $(az account get-access-token --query accessToken -o tsv)" \
  https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{name}?api-version=2022-03-01

# Test through APIM gateway
curl -v --max-time 120 \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  https://{apim-name}.azure-api.net/{api-path}

If the direct call succeeds and the APIM call times out, the issue is in APIM policy or backend URL configuration. If both fail, the problem is in the backend itself.

Check Azure Monitor / Application Insights for the request duration:

# Query Application Insights via Azure CLI
az monitor app-insights query \
  --app {app-insights-name} \
  --resource-group {rg} \
  --analytics-query "
    requests
    | where timestamp > ago(1h)
    | where resultCode in ('408','504')
    | summarize count(), avg(duration), max(duration) by operation_Name
    | order by max_duration desc
  "

Step 2: Fix APIM Backend Timeout

Azure API Management has a default backend timeout of 240 seconds set at the API level. Individual operations can override this. The maximum configurable value is 240 seconds for the Consumption tier and up to 3600 seconds (1 hour) for Developer, Basic, Standard, and Premium tiers.

In the APIM policy editor, add or modify the <backend> section:

<policies>
  <inbound>
    <base />
  </inbound>
  <backend>
    <!-- Increase timeout to 3 minutes for slow analytics endpoints -->
    <forward-request timeout="180" follow-redirects="true" />
  </backend>
  <outbound>
    <base />
  </outbound>
  <on-error>
    <base />
  </on-error>
</policies>

Deploy via Azure CLI:

az apim api operation policy create \
  --resource-group {rg} \
  --service-name {apim-name} \
  --api-id {api-id} \
  --operation-id {operation-id} \
  --policy-format xml \
  --value @policy.xml

Step 3: Fix Azure Function Timeout

Azure Functions timeouts are controlled by functionTimeout in host.json:

Plan Default Maximum
Consumption 5 min 10 min
Premium 30 min Unlimited
Dedicated (App Service) 30 min Unlimited

host.json change:

{
  "version": "2.0",
  "functionTimeout": "00:10:00"
}

For Premium/Dedicated plans, set "functionTimeout": "-1" (no timeout) or a specific duration like "02:00:00".

Redeploy after the change:

func azure functionapp publish {function-app-name} --build remote

Step 4: Fix Client-Side SDK Timeout (C# / .NET)

The Azure SDK for .NET uses HttpClient under the hood. The default HttpClient.Timeout is 100 seconds. When using DefaultAzureCredential and ARM management clients, set timeout via RetryOptions or a custom HttpPipeline:

var options = new ArmClientOptions();
options.Retry.MaxRetries = 3;
options.Retry.Delay = TimeSpan.FromSeconds(2);
options.Retry.MaxDelay = TimeSpan.FromSeconds(30);
options.Retry.Mode = RetryMode.Exponential;

// For custom network timeout, inject HttpClientTransport
var httpClient = new HttpClient { Timeout = TimeSpan.FromSeconds(300) };
var transport = new HttpClientTransport(httpClient);
options.Transport = transport;

var client = new ArmClient(new DefaultAzureCredential(), subscriptionId, options);

Step 5: Handle Long-Running Operations (LRO) Correctly

ARM operations that take longer than 60 seconds must follow the LRO pattern. ARM will return a 202 Accepted with a Location or Azure-AsyncOperation header. You must poll this URL until the state is Succeeded or Failed.

Azure SDKs handle this automatically via WaitForCompletionAsync():

// Correct: await the full LRO
var operation = await resourceGroupCollection.CreateOrUpdateAsync(
    WaitUntil.Completed,   // <-- polls until done
    resourceGroupName,
    new ResourceGroupData(AzureLocation.EastUS));

var resourceGroup = operation.Value;

If you call WaitUntil.Started (or the REST API directly), you receive a 202 and must poll manually:

# Poll the async operation URL
STATUS_URL=$(curl -sI -X PUT \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"location":"eastus"}' \
  "https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}?api-version=2021-04-01" \
  | grep -i 'azure-asyncoperation:' | awk '{print $2}' | tr -d '\r')

while true; do
  STATE=$(curl -s -H "Authorization: Bearer $TOKEN" "$STATUS_URL" | jq -r '.status')
  echo "State: $STATE"
  [[ "$STATE" == "Succeeded" || "$STATE" == "Failed" ]] && break
  sleep 10
done

Step 6: Verify Application Gateway / Front Door Timeouts

If requests pass through Azure Application Gateway, its default backend request timeout is 30 seconds. Increase it:

# Update Application Gateway backend HTTP settings timeout
az network application-gateway http-settings update \
  --gateway-name {gateway-name} \
  --resource-group {rg} \
  --name {http-settings-name} \
  --timeout 120

For Azure Front Door (Standard/Premium), the origin response timeout defaults to 60 seconds and is configurable up to 240 seconds in the origin group settings.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# Azure API Timeout Diagnostic Script
# Usage: ./diagnose-azure-timeout.sh <subscription-id> <resource-group>

SUB="$1"
RG="$2"

if [[ -z "$SUB" || -z "$RG" ]]; then
  echo "Usage: $0 <subscription-id> <resource-group>"
  exit 1
fi

echo "=== Step 1: Acquire access token ==="
TOKEN=$(az account get-access-token --subscription "$SUB" --query accessToken -o tsv)
if [[ -z "$TOKEN" ]]; then
  echo "ERROR: Failed to acquire token. Run 'az login' first."
  exit 1
fi
echo "Token acquired."

echo ""
echo "=== Step 2: Test ARM endpoint latency ==="
time curl -s -o /dev/null -w "HTTP %{http_code} | Time: %{time_total}s | DNS: %{time_namelookup}s | Connect: %{time_connect}s | TLS: %{time_appconnect}s\n" \
  --max-time 90 \
  -H "Authorization: Bearer $TOKEN" \
  "https://management.azure.com/subscriptions/$SUB/resourceGroups/$RG?api-version=2021-04-01"

echo ""
echo "=== Step 3: Check Function App timeout settings ==="
FUNC_APPS=$(az functionapp list --subscription "$SUB" --resource-group "$RG" --query "[].name" -o tsv)
for APP in $FUNC_APPS; do
  echo "--- Function App: $APP ---"
  PLAN=$(az functionapp show --name "$APP" --resource-group "$RG" --query "appServicePlanId" -o tsv)
  PLAN_SKU=$(az appservice plan show --ids "$PLAN" --query "sku.tier" -o tsv 2>/dev/null)
  echo "  Plan tier: $PLAN_SKU"
  TIMEOUT=$(az functionapp config appsettings list \
    --name "$APP" --resource-group "$RG" \
    --query "[?name=='FUNCTIONS_EXTENSION_VERSION'].value" -o tsv)
  echo "  Functions runtime version: $TIMEOUT"
  # Fetch host.json timeout if deployed via zip
  az functionapp show --name "$APP" --resource-group "$RG" \
    --query "{state:state, httpsOnly:httpsOnly, kind:kind}" -o table
done

echo ""
echo "=== Step 4: Query Application Insights for 408/504 errors (last 1 hour) ==="
AI_APPS=$(az monitor app-insights component list --resource-group "$RG" --query "[].name" -o tsv 2>/dev/null)
for AI in $AI_APPS; do
  echo "--- App Insights: $AI ---"
  az monitor app-insights query \
    --app "$AI" \
    --resource-group "$RG" \
    --analytics-query "
      requests
      | where timestamp > ago(1h)
      | where resultCode in ('408', '504', '0')
      | summarize count=count(), avg_duration_ms=avg(duration), max_duration_ms=max(duration)
        by resultCode, operation_Name
      | order by max_duration_ms desc
      | take 20
    " \
    --output table 2>/dev/null || echo "  (Could not query - check App Insights connection)"
done

echo ""
echo "=== Step 5: Check APIM instances ==="
APIMS=$(az apim list --resource-group "$RG" --query "[].name" -o tsv 2>/dev/null)
for APIM in $APIMS; do
  echo "--- APIM: $APIM ---"
  TIER=$(az apim show --name "$APIM" --resource-group "$RG" --query "sku.name" -o tsv)
  echo "  SKU: $TIER"
  if [[ "$TIER" == "Consumption" ]]; then
    echo "  WARNING: Consumption tier max backend timeout is 240s (hard limit)"
  else
    echo "  Max configurable backend timeout: 3600s"
  fi
  echo "  APIs:"
  az apim api list --resource-group "$RG" --service-name "$APIM" \
    --query "[].{name:name, displayName:displayName}" -o table
done

echo ""
echo "=== Step 6: Check Application Gateway backend timeouts ==="
AGWS=$(az network application-gateway list --resource-group "$RG" --query "[].name" -o tsv 2>/dev/null)
for AGW in $AGWS; do
  echo "--- App Gateway: $AGW ---"
  az network application-gateway http-settings list \
    --gateway-name "$AGW" \
    --resource-group "$RG" \
    --query "[].{name:name, timeout:requestTimeout, port:port}" \
    -o table
done

echo ""
echo "=== Diagnostics complete ==="
echo "Review output above and apply fixes per the troubleshooting guide."
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps, SRE, and cloud engineers with combined experience across Azure, AWS, and GCP production environments. We write actionable troubleshooting guides grounded in real incident post-mortems and official vendor documentation.

Sources

Related Guides