Error Medic

Troubleshooting Istio 504 Gateway Timeout & Connection Refused Errors

Fix Istio 504 Gateway Timeout and Connection Refused errors. Learn how to diagnose Envoy proxy issues, configure VirtualServices, and resolve mesh routing failu

Last updated:
Last verified:
810 words
Key Takeaways
  • Misconfigured VirtualService or DestinationRule routing
  • Upstream service latency exceeding default Envoy timeout (15s)
  • mTLS strict mode mismatch between services
  • Resource exhaustion in the Envoy proxy sidecar
Fix Approaches Compared
MethodWhen to UseTimeRisk
Increase VirtualService TimeoutWhen upstream genuinely needs more than 15s to process requests2 minsLow
Fix mTLS ConfigurationWhen seeing connection refused or 503 UC errors during migration5 minsMedium
Adjust Proxy ResourcesWhen Envoy sidecar OOM kills or CPU throttling occurs10 minsMedium
Scale Upstream PodsWhen upstream service is overwhelmed causing latency5 minsLow

Understanding the Error

When working with Istio service mesh, two of the most common and frustrating errors are 504 Gateway Timeout and Connection Refused (often surfacing as 503 Service Unavailable with UC or UF response flags). Because Istio injects an Envoy proxy sidecar alongside every application container, network requests hop through multiple proxies. A failure at any point in this chain can trigger these errors.

Diagnosing 504 Gateway Timeout

By default, Istio's Envoy proxies enforce a strict 15-second timeout on all HTTP requests. If your backend service (e.g., a reporting engine or slow database query) takes longer than 15 seconds to respond, Envoy will terminate the connection and return a 504 Gateway Timeout.

You might see logs like this in your ingress gateway or sidecar:

[2023-10-24T12:00:00.000Z] "POST /api/v1/reports HTTP/1.1" 504 UT "-" "-" 0 0 15000 - "-" ...

The UT flag stands for Upstream Request Timeout.

Diagnosing Connection Refused (503 UC/UF)

Connection Refused typically indicates that the Envoy proxy cannot establish a TCP connection to the upstream service. This often presents as a 503 Service Unavailable with specific response flags in the Envoy access logs:

  • UC (Upstream Connection Termination): The upstream connection was terminated, often due to mTLS misconfigurations.
  • UF (Upstream Connection Failure): Envoy couldn't connect to the upstream, often because the pod is down or listening on the wrong port.

Step 1: Check Envoy Access Logs

The first step in any Istio troubleshooting is to look at the Envoy access logs. Use istioctl to inspect the proxy logs of both the client and the server.

Look for the response flags (like UT, UC, UF, URX).

Step 2: Validate VirtualService Timeouts

If you are hitting the default 15s timeout, you need to explicitly configure the timeout in your VirtualService.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-slow-service
spec:
  hosts:
  - my-slow-service
  http:
  - route:
    - destination:
        host: my-slow-service
    timeout: 60s # Increase timeout to 60 seconds

Step 3: Verify mTLS Settings

If you see Connection Refused or 503 UC, there might be an mTLS mismatch. For instance, the client might be sending plaintext while the server is expecting strict mTLS, or vice versa.

Use istioctl authn tls-check to verify the TLS settings between two pods.

Ensure your PeerAuthentication policies align with your DestinationRule traffic policies. If you are migrating to strict mTLS, try setting it to PERMISSIVE temporarily to see if the issue resolves.

Step 4: Check Pod Resource Limits

If the Envoy proxy sidecar runs out of memory or is heavily CPU throttled, it can drop connections or induce latency, leading to timeouts. Check the metrics for the istio-proxy container. If it's hitting its limits, increase the CPU/Memory requests and limits in the pod deployment annotations.

Frequently Asked Questions

bash
# Check proxy status
istioctl proxy-status

# View Envoy access logs for a specific pod
kubectl logs <pod-name> -c istio-proxy --tail 100

# Check TLS configuration between client and server
istioctl experimental authz check <client-pod-name>

# Analyze Istio configuration for potential issues
istioctl analyze -n <your-namespace>
E

Error Medic Editorial

Error Medic Editorial is a team of Senior Site Reliability Engineers and DevOps practitioners dedicated to solving complex infrastructure and service mesh challenges.

Sources

Related Guides