Error Medic

HAProxy 503 Service Unavailable: Complete Troubleshooting Guide

Fix HAProxy 503 Service Unavailable fast. Root causes: health checks, maxconn limits, no backend pool—plus step-by-step fixes with real commands.

Last updated:
Last verified:
3,022 words
Key Takeaways
  • 503 in HAProxy almost always means every backend server in a pool is marked DOWN, MAINT, or DRAIN—HAProxy generates this response itself without contacting any backend
  • Run `echo 'show servers state' | socat stdio /var/run/haproxy/admin.sock` immediately; a DOWN server list with a failed health check message is the most common culprit
  • Pool exhaustion under high load can also trigger 503s when every server hits its maxconn limit and the queue timeout expires—check the `qcur` column in `show stat`
  • Quick fix: re-enable a downed server with `echo 'set server <backend>/<server> state ready' | socat stdio /var/run/haproxy/admin.sock` without reloading HAProxy
  • 502 means HAProxy reached a backend but got garbage back; 504 means the backend was too slow; 503 means HAProxy never tried—diagnose accordingly
Fix Approaches Compared
MethodWhen to UseTimeRisk
Re-enable server via stats socketServer manually set to MAINT or DRAIN< 1 minLow
Restart crashed backend serviceBackend process is down (connection refused on health check)1–5 minLow
Fix health check endpoint or configBackend is up but /health returns wrong status code10–30 min + reloadLow
Increase maxconn / queue limitsPool exhaustion under high traffic (qcur non-zero)5–15 min + reloadMedium
Add default_backend directiveFrontend ACL has no catch-all rule5 min + reloadLow
Add backup server for maintenance pageNeed graceful degradation instead of raw 50315–30 min + reloadLow
Roll back recent deployment503s began immediately after a config or app deploy5–10 minMedium

Understanding HAProxy 503 Service Unavailable

When HAProxy returns HTTP/1.1 503 Service Unavailable, it is communicating that it has no healthy backend server available to forward the request to. Unlike a 502 Bad Gateway (backend responded with an invalid HTTP response) or a 504 Gateway Timeout (backend accepted the connection but did not respond in time), a 503 means HAProxy made zero attempt to contact a backend—because there are none available to try.

The verbatim response body you will see in a browser or curl output:

HTTP/1.1 503 Service Unavailable
Content-Type: text/html

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>

HAProxy synthesizes this response entirely on its own. This is a critical distinction: your backend applications are not involved in producing this error, and restarting them will not help until HAProxy registers them as healthy again.


Root Causes in Order of Frequency

1. All Backend Servers Are Marked DOWN by Health Checks

The most common cause by far. HAProxy continuously polls backend servers using TCP or HTTP health checks. When a server fails fall consecutive checks after previously being healthy, HAProxy marks it DOWN and stops routing traffic to it. If every server in a backend block is DOWN simultaneously, the very next request to that backend produces a 503.

Example backend configuration showing health check parameters:

backend api_servers
    option httpchk GET /health
    http-check expect status 200
    server web1 10.0.1.10:8080 check inter 5s fall 3 rise 2
    server web2 10.0.1.11:8080 check inter 5s fall 3 rise 2

With fall 3 and inter 5s, it takes 15 seconds of consecutive failures before HAProxy marks a server DOWN. With rise 2, the server needs only 2 successful checks to return to UP—so recovery is faster than demotion.

2. Server Manually Set to MAINT or DRAIN

Operators frequently place servers into maintenance mode (MAINT) via the runtime API or the web stats UI—for example during a deployment—and forget to re-enable them afterward. A server in MAINT is treated identically to a DOWN server for routing purposes. A server in DRAIN accepts no new sessions. Either state causes 503s if it is the only server.

3. Backend Pool Exhaustion (maxconn / queue timeout)

If every server in a backend has hit its individual maxconn limit AND the global queue is also full, or timeout queue has expired before a slot opened, HAProxy returns 503. The subtle difference: the backends are alive and healthy, but HAProxy's queuing mechanism is overwhelmed. Under sustained high traffic this is common and is frequently misdiagnosed as a backend problem.

4. No Matching use_backend or Missing default_backend

If a frontend uses ACL-based routing (use_backend) but no rule matches the incoming request, and there is no default_backend directive, HAProxy has nowhere to send the request and immediately returns 503. This is a configuration error that is easy to miss when adding new ACL rules.

5. Failed Reload Leaving Stale Worker

A botched haproxy -sf reload (due to a config syntax error or a race condition) can leave the old worker process running but in a degraded state where it cannot reach backends. HAProxy's seamless reload is not always seamless—verify with ps aux | grep haproxy that only the expected number of worker processes are running.


Step-by-Step Diagnosis

Step 1: Check the Stats Socket (Fastest First Step)

The HAProxy Unix domain socket exposes the full runtime state in milliseconds:

# Show all backends and server states
echo 'show servers state' | socat stdio /var/run/haproxy/admin.sock

# Alternative: check via HTTP stats page if enabled
curl -s http://localhost:9000/stats

Look for servers with status equal to DOWN, MAINT, or DRAIN. The last_chk field shows the exact health check failure message—for example L7STS (HTTP status mismatch), L4CON (TCP connection refused), or L4TOUT (TCP connect timeout).

Step 2: Read the HAProxy Access Log

HAProxy logs every connection with a structured termination state. A 503 caused by no available server produces a distinctive log line:

Feb 23 14:02:11 proxy1 haproxy[12345]: 10.0.0.5:51234 [23/Feb/2026:14:02:11.042] frontend~ api_servers/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 1/1/0/0/0 0/0 "GET /api/v1/users HTTP/1.1"

Key fields to understand:

  • <NOSRV> — HAProxy selected no server; the backend pool had no viable candidates
  • SC-- — Session terminated at the Connect phase on the Server side; HAProxy never opened a TCP connection to a backend
  • 0/-1/-1/-1/0 — The timer breakdown shows -1 for backend connect, backend response, and total session: none of these phases were entered

If logging is not detailed enough, enable full HTTP logging:

global
    log /dev/log local0 debug

defaults
    log global
    option httplog
    option dontlognull
    option log-health-checks

The option log-health-checks directive is especially useful—it emits a log line every time a server transitions between UP and DOWN, giving you a timeline of when the problem started.

Step 3: Test Each Backend Server Directly from the HAProxy Host

Bypassing HAProxy entirely, connect to each backend from the HAProxy server's network namespace:

# Test raw TCP reachability
nc -zv -w3 10.0.1.10 8080

# Test the exact HTTP health check endpoint
curl -v --max-time 5 http://10.0.1.10:8080/health

# Confirm the backend process is listening
ss -tlnp | grep 8080

# View backend application logs for crashes or errors
journalctl -u your-app.service --since '10 minutes ago' -f

If nc and curl succeed from the HAProxy host but HAProxy marks the server DOWN, the problem is in the health check configuration—mismatched port, wrong HTTP method, or an unexpected response code.

Step 4: Check for Pool Exhaustion

# Show global connection stats
echo 'show info' | socat stdio /var/run/haproxy/admin.sock | grep -E 'MaxConn|CurrConns|MaxConnRate|Uptime'

# Show per-backend queue depths
# Column qcur (field 19) shows current queued requests
echo 'show stat' | socat stdio /var/run/haproxy/admin.sock | \
  cut -d',' -f1,2,18,19,20,5,6 | column -t -s,

If qcur is non-zero and climbing during the 503 window, you have pool exhaustion. The scur (sessions current) reaching slim (sessions limit) on individual servers confirms maxconn saturation.

Step 5: Validate Configuration Syntax

# Test config without applying it—safe to run on a live system
haproxy -c -f /etc/haproxy/haproxy.cfg

# Confirm the running process loaded the expected config file
haproxy -vv 2>&1 | grep 'config file'
ps aux | grep '[h]aproxy'

Fixing HAProxy 503 Errors

Fix 1: Re-enable a Downed or Maintenance Server (No Reload Required)

The runtime API allows instant state changes without touching config files or reloading:

# Bring a MAINT server back to normal operation
echo 'set server api_servers/web1 state ready' | socat stdio /var/run/haproxy/admin.sock

# For a server stuck in DRAIN, first set to ready
echo 'set server api_servers/web1 state ready' | socat stdio /var/run/haproxy/admin.sock

# Confirm the new state
echo 'show servers state' | socat stdio /var/run/haproxy/admin.sock | grep web1

Fix 2: Repair or Relax the Health Check Configuration

If the backend is running but the health check is too strict (e.g., your /health endpoint occasionally returns 204 instead of 200):

backend api_servers
    option httpchk GET /health
    http-check expect rstatus ^(200|204)$
    # Reduce aggressiveness: longer interval, more tolerance
    server web1 10.0.1.10:8080 check inter 10s fall 3 rise 1
    server web2 10.0.1.11:8080 check inter 10s fall 3 rise 1

After editing, reload gracefully to avoid dropping active connections:

systemctl reload haproxy

Fix 3: Increase maxconn and Queue Limits for Pool Exhaustion

defaults
    maxconn 50000
    timeout queue 30s

backend api_servers
    balance roundrobin
    server web1 10.0.1.10:8080 check maxconn 5000
    server web2 10.0.1.11:8080 check maxconn 5000

Also raise the OS file descriptor limit for the HAProxy process:

# Create a systemd override
mkdir -p /etc/systemd/system/haproxy.service.d
cat > /etc/systemd/system/haproxy.service.d/limits.conf <<EOF
[Service]
LimitNOFILE=100000
EOF
systemctl daemon-reload && systemctl reload haproxy

Fix 4: Add a default_backend to Catch Unmatched Requests

frontend http_in
    bind *:80
    use_backend api_servers if { path_beg /api/ }
    use_backend static_servers if { path_beg /static/ }
    default_backend api_servers

Fix 5: Add a Backup Server for Graceful Degradation

Instead of serving a raw 503, route to a maintenance page backend when all primary servers are down:

backend api_servers
    server web1 10.0.1.10:8080 check
    server web2 10.0.1.11:8080 check
    server maintenance 10.0.1.99:8080 backup

# Or use a static error file
defaults
    errorfile 503 /etc/haproxy/errors/503.http

Distinguishing 503 from Related HAProxy Errors

503 Service Unavailable — HAProxy could not select any backend server. The pool is empty, all servers are DOWN/MAINT, queue is exhausted, or ACL routing has no match. HAProxy never opened a TCP connection to a backend. Log shows <NOSRV> and termination state SC--.

502 Bad Gateway — HAProxy connected to a backend successfully, but the backend returned an invalid or incomplete HTTP response (non-HTTP data, closed connection mid-header, malformed status line). Log shows the actual server name with termination flags SH-- or PH--. Fix by inspecting the backend application for crashes or protocol mismatches.

504 Gateway Timeout — HAProxy connected to the backend but it did not send a complete response within timeout server. Log shows --TD or --SD flags with a large server response time. Fix by increasing timeout server or optimizing slow backend queries.

Connection Refused — TCP-level error visible as L4CON in health check status: nc: connect to 10.0.1.10 port 8080: Connection refused. The backend process is not listening on that port. Start or restart the backend service.

Connection ResetL4RST in health check or RST in tcpdump. The backend TCP stack accepted then immediately closed the connection. Causes include firewall rules dropping established connections, TLS handshake failures, or application-level connection limits reached.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# HAProxy 503 Diagnostic Script
# Run as root or a user with read access to the HAProxy stats socket

SOCKET="/var/run/haproxy/admin.sock"
CFG="/etc/haproxy/haproxy.cfg"

echo "======================================="
echo " HAProxy 503 Diagnostic"
echo "======================================="

# 1. Process health
echo ""
echo "--- HAProxy Process Status ---"
if systemctl is-active --quiet haproxy; then
  echo "[OK] haproxy service is ACTIVE"
else
  echo "[FAIL] haproxy service is NOT running -- start it with: systemctl start haproxy"
fi
ps aux | grep '[h]aproxy' | awk '{print "  PID:",$2,"\tUser:",$1,"\tArgs:",$NF}'

# 2. Config syntax check
echo ""
echo "--- Configuration Syntax Check ---"
if haproxy -c -f "$CFG" 2>&1; then
  echo "[OK] Config syntax valid"
else
  echo "[FAIL] Config has errors -- fix before reloading"
fi

# 3. Server states via socket
echo ""
echo "--- Backend Server States ---"
if [ -S "$SOCKET" ]; then
  echo 'show servers state' | socat stdio "$SOCKET"
else
  echo "[WARN] Socket not found at $SOCKET"
  echo "       Check 'stats socket' directive in $CFG"
fi

# 4. Queue depths and connection saturation
echo ""
echo "--- Session Counts and Queue Depths ---"
echo "(qcur > 0 during 503s = pool exhaustion)"
if [ -S "$SOCKET" ]; then
  echo 'show stat' | socat stdio "$SOCKET" | \
    awk -F',' 'NR==1 || $2=="BACKEND" || $2=="FRONTEND" {print $1","$2","$18","$19","$20","$5","$6}' | \
    column -t -s,
fi

# 5. Global limits
echo ""
echo "--- Global HAProxy Limits ---"
if [ -S "$SOCKET" ]; then
  echo 'show info' | socat stdio "$SOCKET" | grep -E 'MaxConn|CurrConns|MaxConnRate|Nbproc|Uptime'
fi

# 6. Recent 503s in logs
echo ""
echo "--- Recent 503 Log Entries (last 30 min) ---"
if journalctl -u haproxy --since '30 minutes ago' 2>/dev/null | grep -c ' 503 ' | grep -qv '^0$'; then
  journalctl -u haproxy --since '30 minutes ago' | grep ' 503 ' | tail -10
else
  grep ' 503 ' /var/log/haproxy.log 2>/dev/null | tail -10 || echo "No 503s found in logs"
fi

# 7. Direct backend connectivity test
echo ""
echo "--- Direct Backend Connectivity Tests ---"
grep -E '^[[:space:]]+server [^ ]+ [0-9]+\.' "$CFG" | awk '{print $3}' | sort -u | while read addr; do
  host=$(echo "$addr" | cut -d: -f1)
  port=$(echo "$addr" | cut -d: -f2)
  if nc -zv -w3 "$host" "$port" 2>&1 | grep -q succeeded; then
    echo "  [OK]   TCP reachable: $host:$port"
  else
    echo "  [FAIL] TCP UNREACHABLE: $host:$port  <-- likely 503 cause"
  fi
done

echo ""
echo "======================================="
echo " Fix Commands Quick Reference"
echo "======================================="
echo ""
echo "Re-enable a downed server (no reload needed):"
echo "  echo 'set server <backend>/<server> state ready' | socat stdio $SOCKET"
echo ""
echo "Place a server into maintenance mode:"
echo "  echo 'set server <backend>/<server> state maint' | socat stdio $SOCKET"
echo ""
echo "Graceful reload (zero dropped connections):"
echo "  systemctl reload haproxy"
echo ""
echo "Force a health check immediately:"
echo "  echo 'set server <backend>/<server> health up' | socat stdio $SOCKET"
echo ""
echo "Drain a server before planned maintenance:"
echo "  echo 'set server <backend>/<server> state drain' | socat stdio $SOCKET"
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps and SRE engineers with 10+ years of experience operating high-traffic systems on HAProxy, NGINX, and cloud load balancers. Our guides are written from real production incident postmortems and tested against live environments before publication.

Sources

Related Guides