How to Fix ImagePullBackOff and Evicted Pods in Kubernetes
Comprehensive guide to troubleshooting and fixing Kubernetes ImagePullBackOff, ErrImagePull, and Evicted pod statuses. Learn root causes and permanent fixes.
- ImagePullBackOff usually means the container image is missing, the tag is wrong, or registry authentication (ImagePullSecrets) is failing.
- Evicted pods are typically caused by node resource pressure, most commonly exhausted memory or ephemeral storage.
- Use 'kubectl describe pod <pod-name>' to identify the exact reason for ImagePullBackOff or Eviction.
- The 'cluster-autoscaler.kubernetes.io/safe-to-evict' annotation controls whether the Cluster Autoscaler can evict a pod during node scale-down.
- Clear evicted pods in bulk using 'kubectl delete pods --field-selector status.phase=Failed'.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Verify Image Tag/Name | When 'kubectl describe' shows 'NotFound' or 'manifest unknown' | Low | Low |
| Create ImagePullSecret | When pulling from a private registry results in 'Unauthorized' or 'Access Denied' | Medium | Low |
| Increase Node Resources/Requests | When pods are Evicted due to Memory or Ephemeral Storage pressure | High | Medium |
| Add safe-to-evict Annotation | When Cluster Autoscaler refuses to scale down a node due to local storage pods | Low | Low |
Understanding ImagePullBackOff and Evicted Pods in Kubernetes
When managing a Kubernetes cluster, whether it's on Azure Kubernetes Service (AKS), Amazon EKS, Google GKE, or Docker Desktop, encountering pod lifecycle errors is inevitable. Two of the most common and disruptive statuses you will encounter are ImagePullBackOff (often preceded by ErrImagePull) and Evicted.
While they manifest differently, both indicate that Kubernetes cannot run your workload as requested. ImagePullBackOff is a failure at the container startup phase, whereas Evicted means a running pod was forcefully terminated by the kubelet to save the node from complete resource starvation.
Diagnosing ImagePullBackOff and ErrImagePull
The ImagePullBackOff status means that Kubernetes tried to pull the container image specified in your pod manifest, failed, and is now backing off (delaying) further attempts. The initial failure state is ErrImagePull.
Root Causes of ImagePullBackOff
- Typo in the Image Name or Tag: The most common cause. If you specify
nginx:latesttinstead ofnginx:latest, the container runtime cannot find the manifest. - Private Registry Authentication: If you are using a private registry (like Azure Container Registry or AWS ECR) and haven't provided the correct credentials via an
ImagePullSecret, the registry will reject the pull request with anUnauthorizederror. - Network Constraints: The worker node might not have outbound internet access or DNS resolution to reach the container registry.
- Rate Limiting: Docker Hub and other public registries impose rate limits. If your cluster shares a single NAT gateway IP, you might be hitting the
toomanyrequestserror.
Step 1: Diagnose the Pull Failure
To find out exactly why the image pull is failing, describe the pod:
kubectl describe pod <pod-name> -n <namespace>
Scroll to the Events section at the bottom. You will likely see something like:
Failed to pull image "myregistry.azurecr.io/my-app:v1": rpc error: code = Unknown desc = Error response from daemon: Get "https://myregistry.azurecr.io/v2/my-app/manifests/v1": unauthorized: authentication required
Step 2: Fix ImagePullBackOff
- For Typos: Correct the deployment manifest and run
kubectl apply -f deployment.yaml. - For Private Registries: Create a secret containing your Docker credentials:
kubectl create secret docker-registry my-registry-secret \
--docker-server=myregistry.azurecr.io \
--docker-username=<your-username> \
--docker-password=<your-password> \
--docker-email=<your-email>
Then, add imagePullSecrets to your Pod spec:
spec:
containers:
- name: my-app
image: myregistry.azurecr.io/my-app:v1
imagePullSecrets:
- name: my-registry-secret
Understanding Pod Eviction in Kubernetes
An Evicted pod status means the kubelet on a worker node terminated the pod. This is not a crash; it's a deliberate action taken by the node to preserve its own stability.
The Kubernetes Eviction Policy
Kubernetes monitors node resources heavily. If a node starts running out of critical, incompressible resources (like memory or disk space), it triggers the eviction policy.
Common eviction triggers include:
- MemoryPressure: The node is running out of RAM.
- DiskPressure / Ephemeral Storage: The node's root filesystem or the container runtime's image filesystem is full. Pods writing large amounts of data to emptyDir volumes or their local container filesystem without requesting ephemeral-storage limits are prime culprits.
- PIDPressure: Too many processes are running on the node.
When you see a pod stuck in the Evicted state, it leaves behind a tombstone record. The pod itself is dead, but the API object remains so you can inspect the eviction logs and status.
Step 1: Diagnose Pod Eviction
Describe the evicted pod to see the exact reason:
kubectl describe pod <evicted-pod-name>
Look at the Status and Message fields. You'll often see something like:
Message: The node was low on resource: ephemeral-storage. Container my-app was using 50Gi, which exceeds its request of 0.
Step 2: Prevent Eviction
To prevent pods from getting evicted:
- Set Resource Requests and Limits: Always define
requestsandlimitsfor CPU, memory, and importantly,ephemeral-storage. - Optimize Logging: If an application logs excessively to stdout/stderr, those logs consume local disk space until log rotation occurs. Use log forwarding to offload them.
- Use Persistent Volumes: Don't use
emptyDirfor large datasets. Attach a PersistentVolumeClaim (PVC).
The Role of cluster-autoscaler.kubernetes.io/safe-to-evict
Sometimes you want pods to be evicted, specifically when the Cluster Autoscaler is trying to scale down an underutilized node. By default, the autoscaler will not evict certain pods, such as those using local storage (emptyDir). This prevents the node from scaling down.
If your pod uses emptyDir strictly for temporary, non-critical cache and you want the autoscaler to feel free to terminate it to save cloud costs, add the following annotation to your pod spec:
metadata:
annotations:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"
Conversely, if you have a critical pod that should never be randomly evicted during scale-down, you can set this to "false".
Cleaning Up Evicted Pods
Kubernetes does not automatically delete evicted pods immediately because it assumes you want to read their failure messages. Over time, these can clutter your dashboard and CLI output.
You can manually clean up evicted pods using a field selector. See the code block below for the exact command.
Frequently Asked Questions
# 1. Diagnose ImagePullBackOff by checking pod events
kubectl describe pod <pod-name> -n <namespace>
# 2. Create an ImagePullSecret for a private registry
kubectl create secret docker-registry my-registry-key \
--docker-server=your-registry.com \
--docker-username=your-user \
--docker-password=your-pwd \
--docker-email=your-email@example.com
# 3. Find all Evicted pods across all namespaces
kubectl get pods --all-namespaces | grep Evicted
# 4. Clean up (delete) all Evicted pods in the current namespace
kubectl delete pods --field-selector status.phase=Failed
# 5. Clean up all Evicted pods in ALL namespaces
kubectl delete pods --all-namespaces --field-selector status.phase=FailedError Medic Editorial
Error Medic Editorial is a team of certified Kubernetes administrators and DevOps engineers dedicated to simplifying cloud-native troubleshooting and site reliability.
Sources
- https://kubernetes.io/docs/concepts/containers/images/#imagepullbackoff
- https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/
- https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
- https://stackoverflow.com/questions/32723111/how-to-remove-old-and-evicted-pods-in-kubernetes