Kubernetes ImagePullBackOff: Complete Troubleshooting Guide (2024)
Fix Kubernetes ImagePullBackOff errors fast. Step-by-step diagnosis for wrong image names, missing pull secrets, registry auth failures, and expired certs.
- ImagePullBackOff means Kubernetes cannot pull the container image — causes range from a typo in the image tag to missing registry credentials or an expired TLS certificate on the registry
- The kubelet backs off exponentially (10s → 20s → 40s … up to 5 min) each time a pull fails, which is why the pod stays stuck instead of retrying immediately
- Quick fix checklist: verify the image name and tag exist, confirm imagePullSecrets are attached to the Pod's ServiceAccount or spec, check registry connectivity and TLS certificate validity, and review RBAC / node IAM roles for private-registry access
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Correct image name/tag in spec | Typo, deleted tag, or wrong registry hostname | < 2 min | None — just a YAML edit |
| Create/update imagePullSecret | Private registry; secret missing or rotated | 2–5 min | Low — secret is namespaced |
| Attach secret to ServiceAccount | All pods in namespace need registry access | 3 min | Low |
| Rotate expired registry TLS cert | Registry returns x509 certificate expired | 15–60 min | Medium — affects all nodes |
| Configure node IAM role (ECR/GCR/ACR) | Cloud-managed registry on same cloud provider | 10–20 min | Medium — IAM change |
| Mirror image to accessible registry | Air-gapped cluster or rate-limited registry | 5–30 min | Low — additive change |
| Patch containerd/docker registry config | Self-signed CA or insecure registry | 10 min | Medium — node-level change |
Understanding Kubernetes ImagePullBackOff
When Kubernetes schedules a Pod, the kubelet on the target node asks the container runtime (containerd, CRI-O, or Docker) to pull the image listed in the Pod spec. If that pull fails, the kubelet records an ErrImagePull event. After the first failure it switches the pod condition to ImagePullBackOff and starts an exponential back-off timer — waiting 10 s, then 20 s, 40 s, 80 s, up to a ceiling of roughly 5 minutes between retries.
You will see the pod status as:
NAME READY STATUS RESTARTS AGE
my-app-7d9f 0/1 ImagePullBackOff 0 4m
and the events:
Warning Failed 4m kubelet Failed to pull image "myrepo/app:ltest": rpc error: code = NotFound
Warning BackOff 3m kubelet Back-off pulling image "myrepo/app:ltest"
ImagePullBackOff is closely related to several other Kubernetes errors you may encounter simultaneously:
- CrashLoopBackOff — the image pulled successfully but the container exits immediately (application error, OOM kill, or missing env vars).
- OOMKilled — the container runtime killed the container because it exceeded its memory limit; the pod may then enter CrashLoopBackOff.
- Connection refused / Timeout — network-layer failures that can prevent the kubelet from reaching the image registry at all.
- Permission denied — RBAC or filesystem permission errors unrelated to image pulling, but sometimes confused with registry auth failures in logs.
- Certificate expired (
x509: certificate has expired or is not yet valid) — an expired TLS certificate on the registry endpoint causes the image pull to fail with a cryptic TLS error rather than a simple 401.
Step 1: Diagnose — Identify the Exact Failure Reason
1a. Describe the pod and read its events
kubectl describe pod <pod-name> -n <namespace>
Scroll to the Events: section. The Failed event message contains the actual error. Common messages:
| Error fragment | Root cause |
|---|---|
manifest unknown / not found / 404 |
Image tag does not exist in registry |
unauthorized / 401 / 403 |
Missing or invalid imagePullSecret |
x509: certificate has expired |
Registry TLS certificate is expired |
x509: certificate signed by unknown authority |
Self-signed CA not trusted by node |
dial tcp … connection refused |
Registry unreachable (firewall, DNS, proxy) |
toomanyrequests / 429 |
Docker Hub rate limit hit |
no space left on device |
Node disk full — old images not garbage-collected |
1b. Check whether the image exists
# From your workstation (must have registry access)
docker manifest inspect myrepo/app:v1.2.3
# For ECR
aws ecr describe-images --repository-name app --image-ids imageTag=v1.2.3
# For GCR / Artifact Registry
gcloud container images describe us-central1-docker.pkg.dev/project/repo/app:v1.2.3
1c. Inspect the imagePullSecret
# List secrets in the namespace
kubectl get secrets -n <namespace>
# Check which secret is referenced
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.imagePullSecrets}'
# Decode the secret and verify credentials
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | python3 -m json.tool
1d. Test registry connectivity from inside the cluster
kubectl run curl-test --image=curlimages/curl:latest --restart=Never --rm -it -- \
curl -v https://registry.example.com/v2/
Step 2: Fix — Targeted Remediation
Fix A — Typo in image name or tag
Edit the Deployment (or other workload controller) and correct the image reference:
kubectl set image deployment/my-app container-name=myrepo/app:v1.2.3 -n <namespace>
# or edit the manifest directly
kubectl edit deployment my-app -n <namespace>
Fix B — Create or rotate an imagePullSecret
For Docker Hub / generic registry:
kubectl create secret docker-registry regcred \
--docker-server=https://index.docker.io/v1/ \
--docker-username=<user> \
--docker-password=<token> \
--docker-email=<email> \
-n <namespace>
For AWS ECR (token rotates every 12 h — use a cron or aws-ecr-credential-helper):
AWS_ACCOUNT=123456789012
REGION=us-east-1
PASSWORD=$(aws ecr get-login-password --region $REGION)
kubectl create secret docker-registry ecr-secret \
--docker-server=${AWS_ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com \
--docker-username=AWS \
--docker-password=$PASSWORD \
-n <namespace>
Then reference the secret in the Pod spec:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: app
image: myrepo/app:v1.2.3
Fix C — Attach the secret to the default ServiceAccount (namespace-wide)
kubectl patch serviceaccount default -n <namespace> \
-p '{"imagePullSecrets": [{"name": "regcred"}]}'
All new pods in the namespace will automatically inherit the pull secret without needing imagePullSecrets in each spec.
Fix D — Expired TLS certificate on private registry
First confirm the certificate expiry:
echo | openssl s_client -connect registry.example.com:443 2>/dev/null | \
openssl x509 -noout -dates
Renew the certificate on the registry host, then restart the registry service. If you are using cert-manager inside the cluster:
# Force immediate renewal
kubectl annotate certificate registry-tls cert-manager.io/issuer-kind=ClusterIssuer \
--overwrite -n cert-manager
# Check renewal
kubectl describe certificate registry-tls -n cert-manager
Fix E — Distribute a self-signed CA to cluster nodes
If the registry uses an internal CA, every node's container runtime must trust it:
# Ubuntu / Debian nodes
sudo cp my-ca.crt /usr/local/share/ca-certificates/my-ca.crt
sudo update-ca-certificates
sudo systemctl restart containerd
For managed clusters (EKS, GKE, AKS) use a DaemonSet to push the CA at boot, or use the cloud provider's node-bootstrap mechanism.
Fix F — Docker Hub rate limit (429 toomanyrequests)
# Add Docker Hub credentials to avoid anonymous limits
kubectl create secret docker-registry dockerhub-creds \
--docker-server=https://index.docker.io/v1/ \
--docker-username=<user> \
--docker-password=<access-token> \
-n <namespace>
Long-term: mirror frequently-used public images to a private registry or use a pull-through cache (Harbor, Nexus, AWS ECR pull-through cache).
Step 3: Verify the Fix
# Watch pod status in real time
kubectl get pods -n <namespace> -w
# Confirm the image is being pulled
kubectl describe pod <pod-name> -n <namespace> | grep -A5 Events
# Check that the pod reaches Running state
kubectl wait --for=condition=Ready pod/<pod-name> -n <namespace> --timeout=120s
Related Issues to Investigate After Fixing ImagePullBackOff
Once the image pulls successfully, the pod may still fail with CrashLoopBackOff (the application crashes on startup), OOMKilled (memory limits too low), or Permission denied errors in the application logs. Use kubectl logs <pod> --previous to retrieve logs from the last crashed container and continue debugging from there.
For Kubernetes timeout errors during image pulls in high-latency environments, increase the kubelet's --image-pull-progress-deadline flag (default 1 minute) or switch to a closer registry mirror.
Frequently Asked Questions
#!/usr/bin/env bash
# ============================================================
# Kubernetes ImagePullBackOff Diagnostic Script
# Usage: ./diagnose-imagepullbackoff.sh <namespace> [pod-name]
# ============================================================
NS=${1:-default}
POD=${2:-""}
echo "=== Pods in ImagePullBackOff or ErrImagePull ==="
kubectl get pods -n "$NS" --field-selector=status.phase!=Running \
| grep -E 'ImagePullBackOff|ErrImagePull|Init:ImagePullBackOff'
if [[ -n "$POD" ]]; then
echo ""
echo "=== Events for pod: $POD ==="
kubectl describe pod "$POD" -n "$NS" | awk '/Events:/,0'
echo ""
echo "=== imagePullSecrets on pod ==="
kubectl get pod "$POD" -n "$NS" \
-o jsonpath='{.spec.imagePullSecrets[*].name}' && echo
echo ""
echo "=== Image references ==="
kubectl get pod "$POD" -n "$NS" \
-o jsonpath='{range .spec.containers[*]}{.name}{"\t"}{.image}{"\n"}{end}'
fi
echo ""
echo "=== Secrets in namespace $NS ==="
kubectl get secrets -n "$NS" --field-selector type=kubernetes.io/dockerconfigjson
echo ""
echo "=== ServiceAccount imagePullSecrets ==="
kubectl get serviceaccount default -n "$NS" \
-o jsonpath='{.imagePullSecrets}' && echo
echo ""
echo "=== Node disk pressure (can block image pulls) ==="
kubectl get nodes -o custom-columns=\
'NAME:.metadata.name,DISK_PRESSURE:.status.conditions[?(@.type=="DiskPressure")].status'
echo ""
echo "=== Recent kubelet image pull events (all namespaces) ==="
kubectl get events --all-namespaces \
--field-selector reason=Failed \
--sort-by='.lastTimestamp' \
| grep -i 'pull\|image' | tail -20
# --- Registry TLS certificate check ---
# Set REGISTRY_HOST before running if you have a private registry
if [[ -n "$REGISTRY_HOST" ]]; then
echo ""
echo "=== TLS certificate expiry for $REGISTRY_HOST ==="
echo | openssl s_client -connect "${REGISTRY_HOST}:443" 2>/dev/null \
| openssl x509 -noout -subject -dates
fiError Medic Editorial
The Error Medic Editorial team is composed of senior SRE and DevOps engineers with hands-on experience operating Kubernetes clusters at scale across AWS, GCP, and Azure. We write precise, command-first troubleshooting guides that help engineers resolve production incidents quickly.
Sources
- https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting
- https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
- https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html
- https://github.com/kubernetes/kubernetes/issues/18787
- https://stackoverflow.com/questions/32510310/kubernetes-imagepullbackoff