Error Medic

Fixing 'Stale file handle' Errors in NFS Mounts on Linux

Resolve NFS 'Stale file handle' errors on Linux. Learn root causes, diagnostic steps, and fixes including lazy unmounting, clearing caches, and server checks.

Last updated:
Last verified:
1,377 words
Key Takeaways
  • A 'Stale file handle' (ESTALE) occurs when a client tries to access a file or directory that has been deleted, replaced, or moved on the NFS server.
  • Network disruptions, server reboots, or changes to export settings can also invalidate file handles.
  • The quickest non-disruptive fix is a lazy unmount (umount -l) followed by a remount.
  • If ghost files persist, clearing the kernel dentry cache is required to drop the cached old inodes.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Lazy Unmount (`umount -l`)Active processes block normal unmount< 1 minLow (active I/O may fail)
Force Unmount (`umount -f`)NFS server is completely unreachable< 1 minMedium (can hang)
Clear Dentry CacheGhost files persist after server updates< 1 minLow (temporary performance hit)
Remount (`mount -a`)After successfully unmounting stale mounts< 1 minLow
Restart NFS Client ServicePersistent state corruption across multiple mounts2-5 minsHigh (disrupts all client mounts)

Understanding the Error

The ESTALE error, commonly seen as Stale file handle, is a frequent and frustrating issue in Network File System (NFS) environments on Linux. This error is returned by the NFS server to the client when the client requests access to a file, directory, or mount point using a file handle that the server no longer recognizes as valid.

In the NFS protocol, a file handle is a unique opaque identifier generated by the server for every exported file and directory. When a client mounts an NFS export, it receives the file handle for the root of that export. As the client navigates the directory structure, it continuously obtains and caches file handles for other files and directories.

A file handle becomes 'stale' typically because the underlying file or directory has been removed, renamed, or replaced on the server, but the client still holds the old handle in its local cache. This often happens when another client, or a local process on the server itself, modifies the filesystem.

Common Root Causes

  1. File or Directory Deletion: Another system or user deleted the exact file or directory the client is currently trying to read or write to.
  2. Server Reboot or Export Changes: The NFS server was restarted, or the export configuration (/etc/exports) was modified and re-exported (exportfs -arv). If filesystems are not exported with a static fsid, the file system IDs can change, invalidating all existing handles.
  3. Underlying Filesystem Modifications: The underlying filesystem on the server was unmounted and remounted, a snapshot was restored, or the inode structure was altered by a filesystem check (fsck).

Step 1: Diagnose

Before attempting a fix, you must verify exactly which mount point is affected. You will often see the error in application logs, cron job outputs, or when running standard filesystem commands like ls, stat, or df.

Run the df command. If it hangs or outputs a stale file handle error, you have located the problematic mount:

df -h
# Output might include: df: /mnt/nfs_data: Stale file handle

You can also check the kernel ring buffer for NFS-related error messages, which can provide insight into when the issue began:

dmesg | grep -i nfs

Step 2: Fix

Approach A: The Lazy Unmount (Recommended)

The most common and effective way to clear a stale mount on the client is to unmount it and mount it again. However, a standard umount command will almost always fail if processes are still trying to access the directory, returning a device is busy error.

A 'lazy' unmount (-l) detaches the filesystem from the namespace immediately and cleans up all references to the filesystem in the background as soon as it is not busy anymore. This is the safest way to break the deadlock.

sudo umount -l /mnt/nfs_data

After the lazy unmount, you can safely remount the share to obtain fresh file handles from the server:

sudo mount /mnt/nfs_data
# Or if it's configured in /etc/fstab:
sudo mount -a
Approach B: Force Unmount

If the NFS server is completely unreachable due to a network partition and the mount is severely hung, a force unmount might be necessary.

sudo umount -f /mnt/nfs_data

Note: umount -f can sometimes still hang indefinitely on unreachable TCP-based NFS mounts. In those specific scenarios, umount -l remains the preferred method to regain control of the terminal.

Approach C: Clearing the Dentry Cache

Sometimes the mount itself is functional, but specific files or subdirectories within it are throwing the stale handle error. This happens because the Linux client has aggressively cached the old inodes and directory entries (dentries) and refuses to ask the server for updated ones.

You can force the Linux kernel to drop its directory entry and inode caches.

Warning: This will drop caches for the entire system, causing a temporary but noticeable spike in disk I/O as caches rebuild from disk.

sync; echo 2 | sudo tee /proc/sys/vm/drop_caches

Step 3: Server-Side Prevention

To prevent this issue from recurring, ensure that your /etc/exports file on the NFS server uses static fsid options, especially if you are exporting filesystems that might be unmounted/remounted, like removable block devices or specific logical volumes (LVM).

Example /etc/exports configuration:

/srv/nfs/shared_data  10.0.0.0/24(rw,sync,no_subtree_check,fsid=100)

The no_subtree_check option is also absolutely crucial for stability. Subtree checking has minor security benefits but drastically increases the likelihood of stale file handles when files are renamed within the export. It causes the server to check if a file belongs to the exported subdirectory; if a file is renamed out of that subdirectory, the handle goes stale. It is highly recommended to disable it (which is the default behavior in modern NFS servers).

Frequently Asked Questions

bash
# 1. Identify the hung mount (this may hang, use Ctrl+C if it does)
df -h

# 2. Check kernel logs for NFS errors to confirm ESTALE
dmesg -T | grep -i nfs

# 3. Perform a lazy unmount of the problematic mount point
sudo umount -l /path/to/stale/mount

# 4. Drop kernel caches to clear out bad dentries/inodes
sudo sync; echo 2 | sudo tee /proc/sys/vm/drop_caches

# 5. Remount the filesystem to get fresh file handles
sudo mount -a
# OR remount specifically
sudo mount /path/to/stale/mount

# 6. Verify the mount is accessible again
ls -la /path/to/stale/mount
E

Error Medic Editorial

The Error Medic Editorial team consists of senior DevOps engineers, Site Reliability Engineers (SREs), and system administrators dedicated to providing actionable, real-world solutions to complex infrastructure challenges.

Sources

Related Guides