Error Medic

Kubernetes ImagePullBackOff: Complete Troubleshooting Guide (2024)

Fix Kubernetes ImagePullBackOff errors fast. Step-by-step diagnosis for wrong image names, missing pull secrets, registry auth failures, and expired certs.

Last updated:
Last verified:
1,914 words
Key Takeaways
  • ImagePullBackOff means Kubernetes cannot pull the container image — causes range from a typo in the image tag to missing registry credentials or an expired TLS certificate on the registry
  • The kubelet backs off exponentially (10s → 20s → 40s … up to 5 min) each time a pull fails, which is why the pod stays stuck instead of retrying immediately
  • Quick fix checklist: verify the image name and tag exist, confirm imagePullSecrets are attached to the Pod's ServiceAccount or spec, check registry connectivity and TLS certificate validity, and review RBAC / node IAM roles for private-registry access
ImagePullBackOff Fix Approaches Compared
MethodWhen to UseTimeRisk
Correct image name/tag in specTypo, deleted tag, or wrong registry hostname< 2 minNone — just a YAML edit
Create/update imagePullSecretPrivate registry; secret missing or rotated2–5 minLow — secret is namespaced
Attach secret to ServiceAccountAll pods in namespace need registry access3 minLow
Rotate expired registry TLS certRegistry returns x509 certificate expired15–60 minMedium — affects all nodes
Configure node IAM role (ECR/GCR/ACR)Cloud-managed registry on same cloud provider10–20 minMedium — IAM change
Mirror image to accessible registryAir-gapped cluster or rate-limited registry5–30 minLow — additive change
Patch containerd/docker registry configSelf-signed CA or insecure registry10 minMedium — node-level change

Understanding Kubernetes ImagePullBackOff

When Kubernetes schedules a Pod, the kubelet on the target node asks the container runtime (containerd, CRI-O, or Docker) to pull the image listed in the Pod spec. If that pull fails, the kubelet records an ErrImagePull event. After the first failure it switches the pod condition to ImagePullBackOff and starts an exponential back-off timer — waiting 10 s, then 20 s, 40 s, 80 s, up to a ceiling of roughly 5 minutes between retries.

You will see the pod status as:

NAME          READY   STATUS             RESTARTS   AGE
my-app-7d9f   0/1     ImagePullBackOff   0          4m

and the events:

Warning  Failed     4m   kubelet  Failed to pull image "myrepo/app:ltest": rpc error: code = NotFound
Warning  BackOff    3m   kubelet  Back-off pulling image "myrepo/app:ltest"

ImagePullBackOff is closely related to several other Kubernetes errors you may encounter simultaneously:

  • CrashLoopBackOff — the image pulled successfully but the container exits immediately (application error, OOM kill, or missing env vars).
  • OOMKilled — the container runtime killed the container because it exceeded its memory limit; the pod may then enter CrashLoopBackOff.
  • Connection refused / Timeout — network-layer failures that can prevent the kubelet from reaching the image registry at all.
  • Permission denied — RBAC or filesystem permission errors unrelated to image pulling, but sometimes confused with registry auth failures in logs.
  • Certificate expired (x509: certificate has expired or is not yet valid) — an expired TLS certificate on the registry endpoint causes the image pull to fail with a cryptic TLS error rather than a simple 401.

Step 1: Diagnose — Identify the Exact Failure Reason

1a. Describe the pod and read its events

kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events: section. The Failed event message contains the actual error. Common messages:

Error fragment Root cause
manifest unknown / not found / 404 Image tag does not exist in registry
unauthorized / 401 / 403 Missing or invalid imagePullSecret
x509: certificate has expired Registry TLS certificate is expired
x509: certificate signed by unknown authority Self-signed CA not trusted by node
dial tcp … connection refused Registry unreachable (firewall, DNS, proxy)
toomanyrequests / 429 Docker Hub rate limit hit
no space left on device Node disk full — old images not garbage-collected

1b. Check whether the image exists

# From your workstation (must have registry access)
docker manifest inspect myrepo/app:v1.2.3

# For ECR
aws ecr describe-images --repository-name app --image-ids imageTag=v1.2.3

# For GCR / Artifact Registry
gcloud container images describe us-central1-docker.pkg.dev/project/repo/app:v1.2.3

1c. Inspect the imagePullSecret

# List secrets in the namespace
kubectl get secrets -n <namespace>

# Check which secret is referenced
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets}'

# Decode the secret and verify credentials
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | python3 -m json.tool

1d. Test registry connectivity from inside the cluster

kubectl run curl-test --image=curlimages/curl:latest --restart=Never --rm -it -- \
  curl -v https://registry.example.com/v2/

Step 2: Fix — Targeted Remediation

Fix A — Typo in image name or tag

Edit the Deployment (or other workload controller) and correct the image reference:

kubectl set image deployment/my-app container-name=myrepo/app:v1.2.3 -n <namespace>
# or edit the manifest directly
kubectl edit deployment my-app -n <namespace>
Fix B — Create or rotate an imagePullSecret

For Docker Hub / generic registry:

kubectl create secret docker-registry regcred \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=<user> \
  --docker-password=<token> \
  --docker-email=<email> \
  -n <namespace>

For AWS ECR (token rotates every 12 h — use a cron or aws-ecr-credential-helper):

AWS_ACCOUNT=123456789012
REGION=us-east-1
PASSWORD=$(aws ecr get-login-password --region $REGION)
kubectl create secret docker-registry ecr-secret \
  --docker-server=${AWS_ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$PASSWORD \
  -n <namespace>

Then reference the secret in the Pod spec:

spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myrepo/app:v1.2.3
Fix C — Attach the secret to the default ServiceAccount (namespace-wide)
kubectl patch serviceaccount default -n <namespace> \
  -p '{"imagePullSecrets": [{"name": "regcred"}]}'

All new pods in the namespace will automatically inherit the pull secret without needing imagePullSecrets in each spec.

Fix D — Expired TLS certificate on private registry

First confirm the certificate expiry:

echo | openssl s_client -connect registry.example.com:443 2>/dev/null | \
  openssl x509 -noout -dates

Renew the certificate on the registry host, then restart the registry service. If you are using cert-manager inside the cluster:

# Force immediate renewal
kubectl annotate certificate registry-tls cert-manager.io/issuer-kind=ClusterIssuer \
  --overwrite -n cert-manager
# Check renewal
kubectl describe certificate registry-tls -n cert-manager
Fix E — Distribute a self-signed CA to cluster nodes

If the registry uses an internal CA, every node's container runtime must trust it:

# Ubuntu / Debian nodes
sudo cp my-ca.crt /usr/local/share/ca-certificates/my-ca.crt
sudo update-ca-certificates
sudo systemctl restart containerd

For managed clusters (EKS, GKE, AKS) use a DaemonSet to push the CA at boot, or use the cloud provider's node-bootstrap mechanism.

Fix F — Docker Hub rate limit (429 toomanyrequests)
# Add Docker Hub credentials to avoid anonymous limits
kubectl create secret docker-registry dockerhub-creds \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=<user> \
  --docker-password=<access-token> \
  -n <namespace>

Long-term: mirror frequently-used public images to a private registry or use a pull-through cache (Harbor, Nexus, AWS ECR pull-through cache).


Step 3: Verify the Fix

# Watch pod status in real time
kubectl get pods -n <namespace> -w

# Confirm the image is being pulled
kubectl describe pod <pod-name> -n <namespace> | grep -A5 Events

# Check that the pod reaches Running state
kubectl wait --for=condition=Ready pod/<pod-name> -n <namespace> --timeout=120s

Related Issues to Investigate After Fixing ImagePullBackOff

Once the image pulls successfully, the pod may still fail with CrashLoopBackOff (the application crashes on startup), OOMKilled (memory limits too low), or Permission denied errors in the application logs. Use kubectl logs <pod> --previous to retrieve logs from the last crashed container and continue debugging from there.

For Kubernetes timeout errors during image pulls in high-latency environments, increase the kubelet's --image-pull-progress-deadline flag (default 1 minute) or switch to a closer registry mirror.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# ============================================================
# Kubernetes ImagePullBackOff Diagnostic Script
# Usage: ./diagnose-imagepullbackoff.sh <namespace> [pod-name]
# ============================================================

NS=${1:-default}
POD=${2:-""}

echo "=== Pods in ImagePullBackOff or ErrImagePull ==="
kubectl get pods -n "$NS" --field-selector=status.phase!=Running \
  | grep -E 'ImagePullBackOff|ErrImagePull|Init:ImagePullBackOff'

if [[ -n "$POD" ]]; then
  echo ""
  echo "=== Events for pod: $POD ==="
  kubectl describe pod "$POD" -n "$NS" | awk '/Events:/,0'

  echo ""
  echo "=== imagePullSecrets on pod ==="
  kubectl get pod "$POD" -n "$NS" \
    -o jsonpath='{.spec.imagePullSecrets[*].name}' && echo

  echo ""
  echo "=== Image references ==="
  kubectl get pod "$POD" -n "$NS" \
    -o jsonpath='{range .spec.containers[*]}{.name}{"\t"}{.image}{"\n"}{end}'
fi

echo ""
echo "=== Secrets in namespace $NS ==="
kubectl get secrets -n "$NS" --field-selector type=kubernetes.io/dockerconfigjson

echo ""
echo "=== ServiceAccount imagePullSecrets ==="
kubectl get serviceaccount default -n "$NS" \
  -o jsonpath='{.imagePullSecrets}' && echo

echo ""
echo "=== Node disk pressure (can block image pulls) ==="
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,DISK_PRESSURE:.status.conditions[?(@.type=="DiskPressure")].status'

echo ""
echo "=== Recent kubelet image pull events (all namespaces) ==="
kubectl get events --all-namespaces \
  --field-selector reason=Failed \
  --sort-by='.lastTimestamp' \
  | grep -i 'pull\|image' | tail -20

# --- Registry TLS certificate check ---
# Set REGISTRY_HOST before running if you have a private registry
if [[ -n "$REGISTRY_HOST" ]]; then
  echo ""
  echo "=== TLS certificate expiry for $REGISTRY_HOST ==="
  echo | openssl s_client -connect "${REGISTRY_HOST}:443" 2>/dev/null \
    | openssl x509 -noout -subject -dates
fi
E

Error Medic Editorial

The Error Medic Editorial team is composed of senior SRE and DevOps engineers with hands-on experience operating Kubernetes clusters at scale across AWS, GCP, and Azure. We write precise, command-first troubleshooting guides that help engineers resolve production incidents quickly.

Sources

Related Guides