Fixing 'Stale file handle' Errors in NFS Mounts on Linux
Resolve NFS 'Stale file handle' errors on Linux. Learn root causes, diagnostic steps, and fixes including lazy unmounting, clearing caches, and server checks.
- A 'Stale file handle' (ESTALE) occurs when a client tries to access a file or directory that has been deleted, replaced, or moved on the NFS server.
- Network disruptions, server reboots, or changes to export settings can also invalidate file handles.
- The quickest non-disruptive fix is a lazy unmount (umount -l) followed by a remount.
- If ghost files persist, clearing the kernel dentry cache is required to drop the cached old inodes.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Lazy Unmount (`umount -l`) | Active processes block normal unmount | < 1 min | Low (active I/O may fail) |
| Force Unmount (`umount -f`) | NFS server is completely unreachable | < 1 min | Medium (can hang) |
| Clear Dentry Cache | Ghost files persist after server updates | < 1 min | Low (temporary performance hit) |
| Remount (`mount -a`) | After successfully unmounting stale mounts | < 1 min | Low |
| Restart NFS Client Service | Persistent state corruption across multiple mounts | 2-5 mins | High (disrupts all client mounts) |
Understanding the Error
The ESTALE error, commonly seen as Stale file handle, is a frequent and frustrating issue in Network File System (NFS) environments on Linux. This error is returned by the NFS server to the client when the client requests access to a file, directory, or mount point using a file handle that the server no longer recognizes as valid.
In the NFS protocol, a file handle is a unique opaque identifier generated by the server for every exported file and directory. When a client mounts an NFS export, it receives the file handle for the root of that export. As the client navigates the directory structure, it continuously obtains and caches file handles for other files and directories.
A file handle becomes 'stale' typically because the underlying file or directory has been removed, renamed, or replaced on the server, but the client still holds the old handle in its local cache. This often happens when another client, or a local process on the server itself, modifies the filesystem.
Common Root Causes
- File or Directory Deletion: Another system or user deleted the exact file or directory the client is currently trying to read or write to.
- Server Reboot or Export Changes: The NFS server was restarted, or the export configuration (
/etc/exports) was modified and re-exported (exportfs -arv). If filesystems are not exported with a staticfsid, the file system IDs can change, invalidating all existing handles. - Underlying Filesystem Modifications: The underlying filesystem on the server was unmounted and remounted, a snapshot was restored, or the inode structure was altered by a filesystem check (
fsck).
Step 1: Diagnose
Before attempting a fix, you must verify exactly which mount point is affected. You will often see the error in application logs, cron job outputs, or when running standard filesystem commands like ls, stat, or df.
Run the df command. If it hangs or outputs a stale file handle error, you have located the problematic mount:
df -h
# Output might include: df: /mnt/nfs_data: Stale file handle
You can also check the kernel ring buffer for NFS-related error messages, which can provide insight into when the issue began:
dmesg | grep -i nfs
Step 2: Fix
Approach A: The Lazy Unmount (Recommended)
The most common and effective way to clear a stale mount on the client is to unmount it and mount it again. However, a standard umount command will almost always fail if processes are still trying to access the directory, returning a device is busy error.
A 'lazy' unmount (-l) detaches the filesystem from the namespace immediately and cleans up all references to the filesystem in the background as soon as it is not busy anymore. This is the safest way to break the deadlock.
sudo umount -l /mnt/nfs_data
After the lazy unmount, you can safely remount the share to obtain fresh file handles from the server:
sudo mount /mnt/nfs_data
# Or if it's configured in /etc/fstab:
sudo mount -a
Approach B: Force Unmount
If the NFS server is completely unreachable due to a network partition and the mount is severely hung, a force unmount might be necessary.
sudo umount -f /mnt/nfs_data
Note: umount -f can sometimes still hang indefinitely on unreachable TCP-based NFS mounts. In those specific scenarios, umount -l remains the preferred method to regain control of the terminal.
Approach C: Clearing the Dentry Cache
Sometimes the mount itself is functional, but specific files or subdirectories within it are throwing the stale handle error. This happens because the Linux client has aggressively cached the old inodes and directory entries (dentries) and refuses to ask the server for updated ones.
You can force the Linux kernel to drop its directory entry and inode caches.
Warning: This will drop caches for the entire system, causing a temporary but noticeable spike in disk I/O as caches rebuild from disk.
sync; echo 2 | sudo tee /proc/sys/vm/drop_caches
Step 3: Server-Side Prevention
To prevent this issue from recurring, ensure that your /etc/exports file on the NFS server uses static fsid options, especially if you are exporting filesystems that might be unmounted/remounted, like removable block devices or specific logical volumes (LVM).
Example /etc/exports configuration:
/srv/nfs/shared_data 10.0.0.0/24(rw,sync,no_subtree_check,fsid=100)
The no_subtree_check option is also absolutely crucial for stability. Subtree checking has minor security benefits but drastically increases the likelihood of stale file handles when files are renamed within the export. It causes the server to check if a file belongs to the exported subdirectory; if a file is renamed out of that subdirectory, the handle goes stale. It is highly recommended to disable it (which is the default behavior in modern NFS servers).
Frequently Asked Questions
# 1. Identify the hung mount (this may hang, use Ctrl+C if it does)
df -h
# 2. Check kernel logs for NFS errors to confirm ESTALE
dmesg -T | grep -i nfs
# 3. Perform a lazy unmount of the problematic mount point
sudo umount -l /path/to/stale/mount
# 4. Drop kernel caches to clear out bad dentries/inodes
sudo sync; echo 2 | sudo tee /proc/sys/vm/drop_caches
# 5. Remount the filesystem to get fresh file handles
sudo mount -a
# OR remount specifically
sudo mount /path/to/stale/mount
# 6. Verify the mount is accessible again
ls -la /path/to/stale/mountError Medic Editorial
The Error Medic Editorial team consists of senior DevOps engineers, Site Reliability Engineers (SREs), and system administrators dedicated to providing actionable, real-world solutions to complex infrastructure challenges.