Error Medic

Resolving 'Amazon RDS Storage Full' and 'EKS Node Not Ready' Errors

Fix Amazon RDS storage full states and resolve EKS node not ready errors. Step-by-step troubleshooting, AWS CLI commands, and root cause analysis.

Last updated:
Last verified:
980 words
Key Takeaways
  • Amazon RDS enters a storage-full state when available space drops to zero, suspending all write operations.
  • EKS nodes frequently enter a NotReady state due to resource exhaustion (DiskPressure/MemoryPressure) or kubelet daemon failures.
  • Quick fix for RDS: Modify the instance via AWS CLI to increase allocated storage, or enable Storage Autoscaling.
  • Quick fix for EKS: Inspect node conditions using kubectl describe node, clear unused images, or restart the kubelet service.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Increase RDS Allocated StorageWhen rds status storage full occurs and DB is unresponsive10-30 minsLow
Enable RDS Storage AutoscalingProactively to prevent storage full rds issues5 minsLow
Clear EKS Node Disk SpaceWhen eks node not ready is caused by DiskPressure5-10 minsMedium
Restart EKS KubeletNode is unresponsive due to PLEG timeout or kubelet crash2 minsLow

Understanding the Interconnected Failures

In modern cloud architectures, infrastructure components are tightly coupled. A critical database failure, such as amazon rds storage full, can cause cascading application failures. While seemingly unrelated, a massive spike in application log output due to database connection errors can exhaust local disk space on Kubernetes worker nodes, leading to an eks node not ready status. This guide tackles both issues, as they often appear together during major localized outages.

Part 1: Troubleshooting Amazon RDS Storage Full

When your database runs out of disk space, it enters the storage-full state. For PostgreSQL users, a rds postgres storage full error is particularly critical because PostgreSQL requires sufficient disk space to write Write-Ahead Logs (WAL). Without it, the database will aggressively halt transactions to prevent corruption.

Step 1: Diagnose the RDS Issue

The first indicator is usually monitoring alerts triggering on FreeStorageSpace. You might see the rds status storage full in the AWS Management Console.

If you are dealing with a rds storage full postgres scenario, the database logs will typically show: PANIC: could not write to file "pg_wal/xlogtemp.123": No space left on device

Step 2: Fix the RDS Storage Full Issue

To resolve rds storage full, you must increase the storage capacity. If your database is in the storage-full state, you cannot perform any operations other than modifying the storage size.

  1. Modify the Instance: Increase the allocated storage by at least 10% (or minimum 10GB) to allow the database to recover.
  2. Enable Autoscaling: To prevent future storage full rds events, ensure RDS Storage Autoscaling is enabled with a maximum storage threshold that accommodates your growth.

Part 2: Troubleshooting EKS Node Not Ready

Simultaneously, you might receive alerts that a node is not ready eks. When a node transitions to the NotReady status, the Kubernetes control plane stops scheduling new pods to it and eventually begins evicting existing pods.

Step 1: Diagnose the EKS Node

A node not ready eks alert usually points to the kubelet. The kubelet performs periodic health checks. If it stops updating the node status, the control plane marks it as NotReady. Common culprits include:

  • DiskPressure: The node's root filesystem or container runtime filesystem is out of space (often due to out-of-control container logs caused by the aforementioned RDS outage).
  • MemoryPressure: The node is out of memory.
  • Network/CNI Issues: The aws-node DaemonSet (VPC CNI) is crashing.
Step 2: Fix the EKS Node
  1. Describe the Node: Run kubectl describe node <node-name> and look at the Conditions section. If you see DiskPressure=True, you need to clear space.
  2. Check Kubelet Logs: SSH or use AWS Systems Manager Session Manager to access the underlying EC2 instance and check the kubelet logs: journalctl -u kubelet -f.
  3. Restart Services: Often, simply restarting the container runtime or the kubelet resolves transient lockups.

Frequently Asked Questions

bash
# --- RDS Diagnostics & Remediation ---

# Check current RDS instance status and storage
aws rds describe-db-instances \
  --db-instance-identifier my-production-db \
  --query 'DBInstances[*].[DBInstanceStatus,AllocatedStorage,MaxAllocatedStorage]' \
  --output table

# Modify RDS instance to increase storage (e.g., to 500GB) and enable autoscaling
aws rds modify-db-instance \
  --db-instance-identifier my-production-db \
  --allocated-storage 500 \
  --max-allocated-storage 1000 \
  --apply-immediately


# --- EKS Diagnostics & Remediation ---

# Find nodes that are NotReady
kubectl get nodes | grep NotReady

# Get detailed conditions for a specific not ready node
kubectl describe node ip-10-0-1-123.ec2.internal | grep -A 5 Conditions

# (Run on the actual EKS worker node via SSH/SSM to check kubelet)
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100 --no-pager

# Restart kubelet to attempt recovery
sudo systemctl restart kubelet
E

Error Medic Editorial

Error Medic Editorial consists of senior Site Reliability Engineers and Cloud Architects dedicated to providing actionable, code-first solutions for complex infrastructure outages.

Sources

Related Guides