What is the difference between ErrImagePull and ImagePullBackOff?

ErrImagePull is the initial failure status reported by the kubelet the first time it fails to pull the image. After that first failure, Kubernetes switches the pod's status to ImagePullBackOff, which means the kubelet is now in an exponential back-off retry loop. You will typically see ErrImagePull for only a few seconds before it transitions to ImagePullBackOff.

My imagePullSecret is correct but I still get ImagePullBackOff — what else could cause this?

Several things can cause this even with a valid secret: (1) the secret is in a different namespace than the pod; (2) the secret has expired credentials — ECR tokens are only valid for 12 hours; (3) the registry TLS certificate is expired or signed by an untrusted CA; (4) a network policy or firewall is blocking outbound HTTPS from nodes to the registry; (5) the node is behind an HTTP proxy that intercepts TLS but whose CA is not trusted by containerd.

How do I fix ImagePullBackOff for a private AWS ECR image?

First confirm the node's IAM role has the `ecr:GetAuthorizationToken`, `ecr:BatchGetImage`, and `ecr:GetDownloadUrlForLayer` permissions. If using an imagePullSecret, note that ECR tokens expire every 12 hours — use the `amazon-ecr-credential-helper` or an operator like `ecr-credentials-sync` to auto-rotate. Also verify the ECR repository is in the same region as your cluster or that cross-region access is explicitly allowed.

Can ImagePullBackOff be caused by a Kubernetes certificate expired error?

Yes. If your private container registry's TLS certificate has expired, the kubelet will fail to establish a TLS handshake and report an error like `x509: certificate has expired or is not yet valid` inside the ImagePullBackOff events. Run `echo | openssl s_client -connect :443 2>/dev/null | openssl x509 -noout -dates` to check. The fix is to renew the registry's certificate and restart the container runtime on the affected nodes.

How do I prevent ImagePullBackOff in production?

Best practices: (1) Use digest-pinned images (`image: repo/app@sha256:abc123`) instead of mutable tags to avoid tag deletion surprises; (2) set up a private registry mirror or pull-through cache to avoid Docker Hub rate limits and external registry outages; (3) automate credential rotation for short-lived registry tokens (ECR, GCR); (4) monitor certificate expiry with Prometheus `ssl_expiry_seconds` metrics or cert-manager alerts; (5) use image pre-pulling via DaemonSets for critical images on all nodes.

Kubernetes ImagePullBackOff: Complete Troubleshooting Guide (2024)

Fix Kubernetes ImagePullBackOff errors fast. Step-by-step diagnosis for wrong image names, missing pull secrets, registry auth failures, and expired certs.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,914 words

Key Takeaways

ImagePullBackOff means Kubernetes cannot pull the container image — causes range from a typo in the image tag to missing registry credentials or an expired TLS certificate on the registry
The kubelet backs off exponentially (10s → 20s → 40s … up to 5 min) each time a pull fails, which is why the pod stays stuck instead of retrying immediately
Quick fix checklist: verify the image name and tag exist, confirm imagePullSecrets are attached to the Pod's ServiceAccount or spec, check registry connectivity and TLS certificate validity, and review RBAC / node IAM roles for private-registry access

ImagePullBackOff Fix Approaches Compared
Method	When to Use	Time	Risk
Correct image name/tag in spec	Typo, deleted tag, or wrong registry hostname	< 2 min	None — just a YAML edit
Create/update imagePullSecret	Private registry; secret missing or rotated	2–5 min	Low — secret is namespaced
Attach secret to ServiceAccount	All pods in namespace need registry access	3 min	Low
Rotate expired registry TLS cert	Registry returns x509 certificate expired	15–60 min	Medium — affects all nodes
Configure node IAM role (ECR/GCR/ACR)	Cloud-managed registry on same cloud provider	10–20 min	Medium — IAM change
Mirror image to accessible registry	Air-gapped cluster or rate-limited registry	5–30 min	Low — additive change
Patch containerd/docker registry config	Self-signed CA or insecure registry	10 min	Medium — node-level change

Understanding Kubernetes ImagePullBackOff

When Kubernetes schedules a Pod, the kubelet on the target node asks the container runtime (containerd, CRI-O, or Docker) to pull the image listed in the Pod spec. If that pull fails, the kubelet records an ErrImagePull event. After the first failure it switches the pod condition to ImagePullBackOff and starts an exponential back-off timer — waiting 10 s, then 20 s, 40 s, 80 s, up to a ceiling of roughly 5 minutes between retries.

You will see the pod status as:

NAME          READY   STATUS             RESTARTS   AGE
my-app-7d9f   0/1     ImagePullBackOff   0          4m

and the events:

Warning  Failed     4m   kubelet  Failed to pull image "myrepo/app:ltest": rpc error: code = NotFound
Warning  BackOff    3m   kubelet  Back-off pulling image "myrepo/app:ltest"

ImagePullBackOff is closely related to several other Kubernetes errors you may encounter simultaneously:

CrashLoopBackOff — the image pulled successfully but the container exits immediately (application error, OOM kill, or missing env vars).
OOMKilled — the container runtime killed the container because it exceeded its memory limit; the pod may then enter CrashLoopBackOff.
Connection refused / Timeout — network-layer failures that can prevent the kubelet from reaching the image registry at all.
Permission denied — RBAC or filesystem permission errors unrelated to image pulling, but sometimes confused with registry auth failures in logs.
Certificate expired (x509: certificate has expired or is not yet valid) — an expired TLS certificate on the registry endpoint causes the image pull to fail with a cryptic TLS error rather than a simple 401.

Step 1: Diagnose — Identify the Exact Failure Reason

1a. Describe the pod and read its events

kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events: section. The Failed event message contains the actual error. Common messages:

Error fragment	Root cause
`manifest unknown` / `not found` / `404`	Image tag does not exist in registry
`unauthorized` / `401` / `403`	Missing or invalid imagePullSecret
`x509: certificate has expired`	Registry TLS certificate is expired
`x509: certificate signed by unknown authority`	Self-signed CA not trusted by node
`dial tcp … connection refused`	Registry unreachable (firewall, DNS, proxy)
`toomanyrequests` / `429`	Docker Hub rate limit hit
`no space left on device`	Node disk full — old images not garbage-collected

1b. Check whether the image exists

# From your workstation (must have registry access)
docker manifest inspect myrepo/app:v1.2.3

# For ECR
aws ecr describe-images --repository-name app --image-ids imageTag=v1.2.3

# For GCR / Artifact Registry
gcloud container images describe us-central1-docker.pkg.dev/project/repo/app:v1.2.3

1c. Inspect the imagePullSecret

# List secrets in the namespace
kubectl get secrets -n <namespace>

# Check which secret is referenced
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets}'

# Decode the secret and verify credentials
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | python3 -m json.tool

1d. Test registry connectivity from inside the cluster

kubectl run curl-test --image=curlimages/curl:latest --restart=Never --rm -it -- \
  curl -v https://registry.example.com/v2/

Step 2: Fix — Targeted Remediation

Fix A — Typo in image name or tag

Edit the Deployment (or other workload controller) and correct the image reference:

kubectl set image deployment/my-app container-name=myrepo/app:v1.2.3 -n <namespace>
# or edit the manifest directly
kubectl edit deployment my-app -n <namespace>

Fix B — Create or rotate an imagePullSecret

For Docker Hub / generic registry:

kubectl create secret docker-registry regcred \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=<user> \
  --docker-password=<token> \
  --docker-email=<email> \
  -n <namespace>

For AWS ECR (token rotates every 12 h — use a cron or aws-ecr-credential-helper):

AWS_ACCOUNT=123456789012
REGION=us-east-1
PASSWORD=$(aws ecr get-login-password --region $REGION)
kubectl create secret docker-registry ecr-secret \
  --docker-server=${AWS_ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$PASSWORD \
  -n <namespace>

Then reference the secret in the Pod spec:

spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myrepo/app:v1.2.3

Fix C — Attach the secret to the default ServiceAccount (namespace-wide)

kubectl patch serviceaccount default -n <namespace> \
  -p '{"imagePullSecrets": [{"name": "regcred"}]}'

All new pods in the namespace will automatically inherit the pull secret without needing imagePullSecrets in each spec.

Fix D — Expired TLS certificate on private registry

First confirm the certificate expiry:

echo | openssl s_client -connect registry.example.com:443 2>/dev/null | \
  openssl x509 -noout -dates

Renew the certificate on the registry host, then restart the registry service. If you are using cert-manager inside the cluster:

# Force immediate renewal
kubectl annotate certificate registry-tls cert-manager.io/issuer-kind=ClusterIssuer \
  --overwrite -n cert-manager
# Check renewal
kubectl describe certificate registry-tls -n cert-manager

Fix E — Distribute a self-signed CA to cluster nodes

If the registry uses an internal CA, every node's container runtime must trust it:

# Ubuntu / Debian nodes
sudo cp my-ca.crt /usr/local/share/ca-certificates/my-ca.crt
sudo update-ca-certificates
sudo systemctl restart containerd

For managed clusters (EKS, GKE, AKS) use a DaemonSet to push the CA at boot, or use the cloud provider's node-bootstrap mechanism.

Fix F — Docker Hub rate limit (429 toomanyrequests)

# Add Docker Hub credentials to avoid anonymous limits
kubectl create secret docker-registry dockerhub-creds \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=<user> \
  --docker-password=<access-token> \
  -n <namespace>

Long-term: mirror frequently-used public images to a private registry or use a pull-through cache (Harbor, Nexus, AWS ECR pull-through cache).

Step 3: Verify the Fix

# Watch pod status in real time
kubectl get pods -n <namespace> -w

# Confirm the image is being pulled
kubectl describe pod <pod-name> -n <namespace> | grep -A5 Events

# Check that the pod reaches Running state
kubectl wait --for=condition=Ready pod/<pod-name> -n <namespace> --timeout=120s

Related Issues to Investigate After Fixing ImagePullBackOff

Once the image pulls successfully, the pod may still fail with CrashLoopBackOff (the application crashes on startup), OOMKilled (memory limits too low), or Permission denied errors in the application logs. Use kubectl logs <pod> --previous to retrieve logs from the last crashed container and continue debugging from there.

For Kubernetes timeout errors during image pulls in high-latency environments, increase the kubelet's --image-pull-progress-deadline flag (default 1 minute) or switch to a closer registry mirror.

Frequently Asked Questions

bash

#!/usr/bin/env bash
# ============================================================
# Kubernetes ImagePullBackOff Diagnostic Script
# Usage: ./diagnose-imagepullbackoff.sh <namespace> [pod-name]
# ============================================================

NS=${1:-default}
POD=${2:-""}

echo "=== Pods in ImagePullBackOff or ErrImagePull ==="
kubectl get pods -n "$NS" --field-selector=status.phase!=Running \
  | grep -E 'ImagePullBackOff|ErrImagePull|Init:ImagePullBackOff'

if [[ -n "$POD" ]]; then
  echo ""
  echo "=== Events for pod: $POD ==="
  kubectl describe pod "$POD" -n "$NS" | awk '/Events:/,0'

  echo ""
  echo "=== imagePullSecrets on pod ==="
  kubectl get pod "$POD" -n "$NS" \
    -o jsonpath='{.spec.imagePullSecrets[*].name}' && echo

  echo ""
  echo "=== Image references ==="
  kubectl get pod "$POD" -n "$NS" \
    -o jsonpath='{range .spec.containers[*]}{.name}{"\t"}{.image}{"\n"}{end}'
fi

echo ""
echo "=== Secrets in namespace $NS ==="
kubectl get secrets -n "$NS" --field-selector type=kubernetes.io/dockerconfigjson

echo ""
echo "=== ServiceAccount imagePullSecrets ==="
kubectl get serviceaccount default -n "$NS" \
  -o jsonpath='{.imagePullSecrets}' && echo

echo ""
echo "=== Node disk pressure (can block image pulls) ==="
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,DISK_PRESSURE:.status.conditions[?(@.type=="DiskPressure")].status'

echo ""
echo "=== Recent kubelet image pull events (all namespaces) ==="
kubectl get events --all-namespaces \
  --field-selector reason=Failed \
  --sort-by='.lastTimestamp' \
  | grep -i 'pull\|image' | tail -20

# --- Registry TLS certificate check ---
# Set REGISTRY_HOST before running if you have a private registry
if [[ -n "$REGISTRY_HOST" ]]; then
  echo ""
  echo "=== TLS certificate expiry for $REGISTRY_HOST ==="
  echo | openssl s_client -connect "${REGISTRY_HOST}:443" 2>/dev/null \
    | openssl x509 -noout -subject -dates
fi

Error Medic Editorial

The Error Medic Editorial team is composed of senior SRE and DevOps engineers with hands-on experience operating Kubernetes clusters at scale across AWS, GCP, and Azure. We write precise, command-first troubleshooting guides that help engineers resolve production incidents quickly.

Sources

Comprehensive guide to troubleshooting and fixing Kubernetes ImagePullBackOff, ErrImagePull, and Evicted pod statuses. Learn root causes and permanent fixes.

How to Fix Kubernetes ImagePullBackOff, CrashLoopBackOff, and OOMKilled Errors

Comprehensive troubleshooting guide for Kubernetes ImagePullBackOff, CrashLoopBackOff, and OOM killed errors. Learn to fix permissions, timeouts, and crashes.

Kubernetes CrashLoopBackOff: How to Fix 'back-off restarting failed container'

Resolve Kubernetes CrashLoopBackOff errors fast. Learn to diagnose 'back-off restarting failed container', fix OOMKilled, and debug CoreDNS or Alertmanager.

Resolving Kubernetes ImagePullBackOff, CrashLoopBackOff, and OOMKilled Errors

A comprehensive guide to diagnosing and fixing critical Kubernetes pod failures, including ImagePullBackOff, OOMKilled, CrashLoopBackOff, and network errors.