Error Medic

ArgoCD Connection Refused: Fix CrashLoopBackOff, ImagePullBackOff, Permission Denied & Timeout Errors

Fix ArgoCD connection refused errors: diagnose CrashLoopBackOff, ImagePullBackOff, permission denied, and timeout with step-by-step kubectl commands and config

Last updated:
Last verified:
2,661 words
Key Takeaways
  • ArgoCD 'connection refused' on port 443 is caused by the argocd-server pod being down, a Service selector mismatch, TLS misconfiguration, or a NetworkPolicy silently dropping packets before the TCP handshake completes.
  • CrashLoopBackOff in ArgoCD components most commonly signals OOMKilled resource limits on argocd-repo-server, a corrupted or missing argocd-secret, an invalid Redis connection string, or a broken kubeconfig mount in the application controller.
  • ImagePullBackOff occurs when the container image tag does not exist on the registry, the cluster lacks imagePullSecrets for a private registry, or Docker Hub rate limiting is encountered on shared-IP nodes.
  • Permission denied errors arise when the argocd-manager ClusterRoleBinding in the destination cluster was deleted or its token expired, or when the argocd-rbac-cm ConfigMap contains an overly restrictive policy.csv.
  • Timeout errors during sync or API calls are typically caused by argocd-repo-server exhausting memory while rendering large Helm charts, or by the default 30-second deadline being too short for slow API servers.
  • Quick fix summary: run 'kubectl get pods -n argocd' first, check Events with 'kubectl describe pod', validate argocd-secret keys, then address the specific root cause before restarting pods.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Restart argocd-server DeploymentPod is in CrashLoopBackOff or Pending after a config change< 2 minLow — graceful rolling restart
Patch argocd-secret or argocd-cmWrong admin password hash, Dex config error, or missing TLS keys5–10 minMedium — wrong values break login entirely
Re-apply argocd-manager ClusterRoleBindingPermission denied syncing to remote cluster, expired token, or deleted binding5 minLow — additive permission grant
Rebuild imagePullSecretImagePullBackOff on private registry or expired credential token5–10 minLow — secret replacement is non-disruptive
Increase resource limits on repo-serverOOMKilled CrashLoopBackOff when rendering large Helm charts or Kustomize10 min + rolloutLow — monitored rolling update
Fix NetworkPolicy or Ingress annotationsConnection refused from outside cluster or between ArgoCD internal components15–30 minMedium — may briefly disrupt in-flight syncs
Re-register destination clusterArgoCD cannot connect to managed cluster due to rotated certificate or expired token10 minLow — cluster entry is cleanly replaced

Understanding ArgoCD Connection Refused and Related Errors

ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. Its architecture comprises several pods in the argocd namespace: argocd-server (API and UI gateway on port 443/8080), argocd-repo-server (clones repositories and renders manifests on port 8081), argocd-application-controller (StatefulSet that reconciles desired vs live state), argocd-dex-server (OIDC connector on port 5556), and argocd-redis (in-memory cache on port 6379). When any component fails, errors cascade — a Redis crash causes argocd-server to crash, which produces connection refused for every client.

What the Error Messages Look Like

When argocd-server is unreachable, the CLI shows:

FATAL[0001] dial tcp <IP>:443: connect: connection refused

In the browser you see ERR_CONNECTION_REFUSED. This is distinct from a timeout (no route) or TLS error (connected but handshake failed). A CrashLoopBackOff pod produces this error because Kubernetes restarts the container but the TCP port is not yet listening during the backoff window.


Step 1: Triage All ArgoCD Pod Statuses

Begin every investigation with a full namespace snapshot:

kubectl get pods -n argocd -o wide
kubectl get svc,endpoints -n argocd
kubectl get events -n argocd --sort-by='.lastTimestamp' | tail -40

Expected healthy output:

NAME                                       READY   STATUS    RESTARTS   AGE
argocd-application-controller-0            1/1     Running   0          2d
argocd-dex-server-7f8d9b4c6-xk2pq         1/1     Running   0          2d
argocd-redis-6b4d5f8c7-lmnop              1/1     Running   0          2d
argocd-repo-server-5c9d8f7b6-qrstu        1/1     Running   0          2d
argocd-server-6d7e8f9a5-vwxyz             1/1     Running   0          2d

If any pod shows CrashLoopBackOff, ImagePullBackOff, Pending, or a restart count above 5, that component is your starting point.


Step 2: Diagnosing and Fixing CrashLoopBackOff

CrashLoopBackOff means the container started, exited with a non-zero code, and Kubernetes is applying an exponential backoff before the next restart attempt (starting at 10s, doubling up to 5 minutes).

2a. OOMKilled — Out of Memory

kubectl describe pod -n argocd <pod-name> | grep -A8 'Last State'

If you see Reason: OOMKilled, the container exceeded its memory limit. Increase the limit for the affected Deployment:

kubectl patch deployment argocd-repo-server -n argocd \
  --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"768Mi"}]'

argocd-repo-server is the most memory-intensive component when rendering Helm charts with many dependencies or Kustomize overlays on monorepos. Start with 512Mi and increase if OOMKills persist.

2b. Broken or Missing argocd-secret

ArgoCD requires a Secret named argocd-secret for the admin password hash, TLS certificates, and encryption key. If it is absent or missing keys, argocd-server crashes immediately:

panic: Failed to initialize server: could not read admin password: secret "argocd-secret" not found

Verify required keys are present:

kubectl get secret argocd-secret -n argocd -o jsonpath='{.data}' | \
  python3 -c "import sys,json; [print(k) for k in json.load(sys.stdin)]"

Expected keys: admin.password, admin.passwordMtime, server.secretkey, tls.crt, tls.key.

To reset the admin password (requires htpasswd from the apache2-utils package):

HASHED=$(htpasswd -nbBC 10 "" 'NewSecurePass123!' | tr -d ':\n' | sed 's/$2y/$2a/')
kubectl -n argocd patch secret argocd-secret \
  -p "{\"stringData\":{\"admin.password\":\"$HASHED\",\"admin.passwordMtime\":\"$(date +%FT%T%Z)\"}}"

2c. Redis Connection Failure Causing Cascade Crash

If argocd-redis is down, both argocd-server and argocd-repo-server will crash with:

FATAL[0000] Failed to connect to Redis: dial tcp argocd-redis:6379: connect: connection refused

Verify Redis health and its Service endpoints:

kubectl get pod -n argocd -l app.kubernetes.io/name=argocd-redis
kubectl get endpoints argocd-redis -n argocd

If the pod is Running but Endpoints is empty, the Service label selector was broken — commonly by a partial Helm upgrade that changed label schemas between versions.


Step 3: Diagnosing and Fixing ImagePullBackOff

The kubelet reports this error when it cannot pull the container image. kubectl describe pod will show one of these Event messages:

Failed to pull image "quay.io/argoproj/argocd:v2.99.0": manifest unknown: manifest unknown
Failed to pull image "quay.io/argoproj/argocd:v2.9.0": unauthorized: access to the requested resource is not authorized

Fix: Verify the Image Tag Exists

# Check the exact image reference the Deployment is using
kubectl get deployment argocd-server -n argocd \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

# Cross-check the tag exists on quay.io
curl -s "https://quay.io/api/v1/repository/argoproj/argocd/tag/?specificTag=v2.9.0" | \
  jq '.tags[0].name'

Fix: Create an imagePullSecret for Private Registries

kubectl create secret docker-registry argocd-registry-creds \
  --docker-server=<your-registry> \
  --docker-username=<username> \
  --docker-password=<token-or-password> \
  -n argocd

# Patch the argocd-server ServiceAccount to use it
kubectl patch serviceaccount argocd-server -n argocd \
  -p '{"imagePullSecrets":[{"name":"argocd-registry-creds"}]}'

# Restart affected pods to pick up the new pull secret
kubectl rollout restart deployment argocd-server argocd-repo-server -n argocd

Step 4: Fixing Permission Denied Errors

4a. ArgoCD UI and CLI Access Denied

If users see permission denied in the UI or rpc error: code = PermissionDenied desc = permission denied from the CLI, the argocd-rbac-cm ConfigMap policy is too restrictive:

kubectl get configmap argocd-rbac-cm -n argocd -o yaml

A minimal read-only policy for a developer group:

data:
  policy.csv: |
    p, role:viewer, applications, get, */*, allow
    p, role:viewer, applications, sync, */*, allow
    p, role:viewer, clusters, get, *, allow
    g, my-org:developers, role:viewer
  policy.default: role:readonly

4b. Destination Cluster Permission Denied During Sync

When ArgoCD cannot create or update resources in the target cluster:

failed to sync: error creating ClusterRole: clusterroles.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:argocd:argocd-manager" cannot create resource "clusterroles"

The argocd-manager ClusterRoleBinding in the destination cluster was deleted or its ServiceAccount token expired. Re-register the cluster to restore it:

argocd cluster add <kubectl-context-name> --name <cluster-display-name>

This command re-creates the argocd-manager ServiceAccount, ClusterRole, and ClusterRoleBinding in the destination cluster and stores the new token in the argocd namespace.


Step 5: Diagnosing and Fixing Timeout Errors

Timeout errors appear in the ArgoCD UI as:

RPC failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

or in argocd app sync output:

FATAL[0030] timed out waiting for sync to complete

5a. Increase the Repo Server Timeout

Edit argocd-cmd-params-cm to extend the deadline:

kubectl edit configmap argocd-cmd-params-cm -n argocd

Add or update these keys:

data:
  server.repo.server.timeout.seconds: "120"
  controller.repo.server.timeout.seconds: "120"

Apply the change by restarting the deployments:

kubectl rollout restart deployment/argocd-server deployment/argocd-repo-server -n argocd

5b. NetworkPolicy Blocking Internal Component Communication

ArgoCD components communicate on fixed ports. If strict NetworkPolicy rules are in place, verify connectivity:

# Test argocd-server to argocd-repo-server (gRPC manifests API)
kubectl exec -n argocd deploy/argocd-server -- nc -zv argocd-repo-server 8081

# Test argocd-server to argocd-redis
kubectl exec -n argocd deploy/argocd-server -- nc -zv argocd-redis 6379

# Test argocd-server to argocd-dex-server (OIDC token endpoint)
kubectl exec -n argocd deploy/argocd-server -- nc -zv argocd-dex-server 5556

Apply a permissive intra-namespace NetworkPolicy to allow all ArgoCD components to communicate:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-argocd-internal
  namespace: argocd
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: argocd
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: argocd
  policyTypes: [Ingress, Egress]

Step 6: Fixing Connection Refused When the Pod Is Already Running

If argocd-server shows Running but you still cannot connect, the issue is at the Service or Ingress layer:

# Check the Service type and ports
kubectl get svc argocd-server -n argocd -o wide

# Verify the Service selector matches the pod labels
kubectl get pod -n argocd -l app.kubernetes.io/name=argocd-server --show-labels

# If type=ClusterIP, use port-forward for local access
kubectl port-forward svc/argocd-server -n argocd 8080:443

For LoadBalancer Services that remain <pending> on cloud providers, check your cloud LB quota. On bare-metal clusters using MetalLB, verify the IPAddressPool has available addresses:

kubectl get ipaddresspool -n metallb-system
kubectl get l2advertisement -n metallb-system

TLS Passthrough vs SSL Termination

ArgoCD multiplexes HTTPS and gRPC on the same port 443 using ALPN. If your Ingress terminates TLS and re-encrypts without gRPC passthrough, the argocd CLI will fail with EOF or connection refused even though the browser works. For NGINX Ingress Controller, use TLS passthrough:

annotations:
  nginx.ingress.kubernetes.io/ssl-passthrough: "true"
  nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"

Verification Checklist

After any fix, confirm full resolution:

# Confirm all pods Running with zero restarts
kubectl get pods -n argocd

# Log in via CLI
argocd login <server-hostname> --username admin --password <password> --insecure

# List and inspect application health
argocd app list
argocd app get <app-name> --refresh

# Force a sync if the app shows Degraded or Unknown
argocd app sync <app-name> --prune --force

Frequently Asked Questions

bash
#!/usr/bin/env bash
# ArgoCD Full Diagnostic Script
# Usage: bash argocd-diagnose.sh [namespace]
# Default namespace: argocd

NS="${1:-argocd}"

separator() { echo ""; echo "=============================="; echo "=== $1"; echo "=============================="; }

separator "Pod Status"
kubectl get pods -n "$NS" -o wide

separator "Services and Endpoints"
kubectl get svc,endpoints -n "$NS"

separator "Recent Events (newest 50)"
kubectl get events -n "$NS" --sort-by='.lastTimestamp' | tail -50

separator "argocd-server Logs (last 60 lines)"
kubectl logs -n "$NS" deployment/argocd-server --tail=60 2>&1

separator "argocd-repo-server Logs (last 60 lines)"
kubectl logs -n "$NS" deployment/argocd-repo-server --tail=60 2>&1

separator "argocd-application-controller Logs (last 60 lines)"
kubectl logs -n "$NS" statefulset/argocd-application-controller --tail=60 2>&1

separator "argocd-redis Logs (last 30 lines)"
kubectl logs -n "$NS" deployment/argocd-redis --tail=30 2>&1

separator "Describe Non-Running Pods"
for pod in $(kubectl get pods -n "$NS" --no-headers | awk '$3 != "Running" {print $1}'); do
  echo "--- Pod: $pod ---"
  kubectl describe pod "$pod" -n "$NS" | grep -A25 'State:\|Last State:\|Events:'
  echo ""
done

separator "argocd-secret Key Names (no values)"
kubectl get secret argocd-secret -n "$NS" -o json 2>/dev/null | \
  python3 -c "import sys,json; d=json.load(sys.stdin); [print(k) for k in d.get('data',{}).keys()]" || \
  echo "ERROR: argocd-secret not found — this will cause CrashLoopBackOff"

separator "argocd-cm ConfigMap"
kubectl get configmap argocd-cm -n "$NS" -o yaml 2>&1

separator "argocd-cmd-params-cm ConfigMap"
kubectl get configmap argocd-cmd-params-cm -n "$NS" -o yaml 2>&1

separator "argocd-rbac-cm ConfigMap"
kubectl get configmap argocd-rbac-cm -n "$NS" -o yaml 2>&1

separator "Resource Usage (top pods)"
kubectl top pods -n "$NS" 2>/dev/null || echo "metrics-server not available"

separator "Internal Network Connectivity Tests"
kubectl exec -n "$NS" deployment/argocd-server -- sh -c '
  echo -n "repo-server:8081 -> "; nc -zv argocd-repo-server 8081 2>&1
  echo -n "redis:6379       -> "; nc -zv argocd-redis 6379 2>&1
  echo -n "dex-server:5556  -> "; nc -zv argocd-dex-server 5556 2>&1
' 2>/dev/null || echo "Could not exec into argocd-server pod"

separator "Registered Remote Clusters"
argocd cluster list 2>/dev/null || echo "argocd CLI not authenticated — run: argocd login <server>"

separator "Application Health Summary"
argocd app list 2>/dev/null || echo "argocd CLI not authenticated"

separator "Diagnostic Complete"
echo "Review any ERROR or CrashLoopBackOff entries above."
echo "Check argocd-secret keys, Redis Endpoints, and NetworkPolicy rules first."
E

Error Medic Editorial

The Error Medic Editorial team consists of senior DevOps engineers, SREs, and platform engineers with collective experience running Kubernetes workloads at scale across AWS EKS, Google GKE, Azure AKS, and on-premises clusters. We specialize in the Kubernetes GitOps ecosystem including ArgoCD, Flux CD, Tekton, and Crossplane. Our troubleshooting guides are validated against real production incidents, tested across multiple ArgoCD versions (2.6 through 2.12), and reviewed against official upstream documentation and GitHub issue history.

Sources

Related Guides