Why does `df -h` hang indefinitely when I have a stale file handle?

The `df` command attempts to run the `stat` system call on every mounted filesystem to gather size and usage information. If an NFS mount is stale or the server is unresponsive, the `stat` call blocks waiting for a valid response from the network, causing `df` to hang. You can use `df -l` to only check local filesystems and bypass the hang.

Can I fix a stale file handle without unmounting the drive?

Generally, no. If the file handle for the root of the mount point itself is stale, the client must unmount and remount to obtain a completely new, valid file handle from the server. If only a specific file inside the mount is stale, dropping the kernel dentry cache (`echo 2 > /proc/sys/vm/drop_caches`) might force the client to fetch the new handle without requiring a full remount.

What is the difference between `umount -f` and `umount -l`?

`umount -f` (force) is designed to forcefully sever the connection when communication with the server is lost. However, it can still hang if the kernel is stuck waiting for TCP timeouts. `umount -l` (lazy) immediately removes the mount point from the filesystem hierarchy, making it invisible to new processes, and cleans up references in the background once existing processes release their open file descriptors.

How does `no_subtree_check` help prevent stale file handles?

With `subtree_check` enabled, the NFS server verifies not just the file handle, but also that the file still resides in the specifically exported subdirectory. If a user renames a file to a different directory within the same export, its parent directory changes, causing the subtree check to fail and resulting in a stale file handle for the client. `no_subtree_check` disables this strict verification, preventing the error during renames.

Fixing 'Stale file handle' Errors in NFS Mounts on Linux

Fix Approaches Compared
Method	When to Use	Time	Risk
Lazy Unmount (`umount -l`)	Active processes block normal unmount	< 1 min	Low (active I/O may fail)
Force Unmount (`umount -f`)	NFS server is completely unreachable	< 1 min	Medium (can hang)
Clear Dentry Cache	Ghost files persist after server updates	< 1 min	Low (temporary performance hit)
Remount (`mount -a`)	After successfully unmounting stale mounts	< 1 min	Low
Restart NFS Client Service	Persistent state corruption across multiple mounts	2-5 mins	High (disrupts all client mounts)

Understanding the Error

The ESTALE error, commonly seen as Stale file handle, is a frequent and frustrating issue in Network File System (NFS) environments on Linux. This error is returned by the NFS server to the client when the client requests access to a file, directory, or mount point using a file handle that the server no longer recognizes as valid.

In the NFS protocol, a file handle is a unique opaque identifier generated by the server for every exported file and directory. When a client mounts an NFS export, it receives the file handle for the root of that export. As the client navigates the directory structure, it continuously obtains and caches file handles for other files and directories.

A file handle becomes 'stale' typically because the underlying file or directory has been removed, renamed, or replaced on the server, but the client still holds the old handle in its local cache. This often happens when another client, or a local process on the server itself, modifies the filesystem.

Common Root Causes

File or Directory Deletion: Another system or user deleted the exact file or directory the client is currently trying to read or write to.
Server Reboot or Export Changes: The NFS server was restarted, or the export configuration (/etc/exports) was modified and re-exported (exportfs -arv). If filesystems are not exported with a static fsid, the file system IDs can change, invalidating all existing handles.
Underlying Filesystem Modifications: The underlying filesystem on the server was unmounted and remounted, a snapshot was restored, or the inode structure was altered by a filesystem check (fsck).

Step 1: Diagnose

Before attempting a fix, you must verify exactly which mount point is affected. You will often see the error in application logs, cron job outputs, or when running standard filesystem commands like ls, stat, or df.

Run the df command. If it hangs or outputs a stale file handle error, you have located the problematic mount:

df -h
# Output might include: df: /mnt/nfs_data: Stale file handle

You can also check the kernel ring buffer for NFS-related error messages, which can provide insight into when the issue began:

dmesg | grep -i nfs

Step 2: Fix

Approach A: The Lazy Unmount (Recommended)

The most common and effective way to clear a stale mount on the client is to unmount it and mount it again. However, a standard umount command will almost always fail if processes are still trying to access the directory, returning a device is busy error.

A 'lazy' unmount (-l) detaches the filesystem from the namespace immediately and cleans up all references to the filesystem in the background as soon as it is not busy anymore. This is the safest way to break the deadlock.

sudo umount -l /mnt/nfs_data

After the lazy unmount, you can safely remount the share to obtain fresh file handles from the server:

sudo mount /mnt/nfs_data
# Or if it's configured in /etc/fstab:
sudo mount -a

Approach B: Force Unmount

If the NFS server is completely unreachable due to a network partition and the mount is severely hung, a force unmount might be necessary.

sudo umount -f /mnt/nfs_data

Note: umount -f can sometimes still hang indefinitely on unreachable TCP-based NFS mounts. In those specific scenarios, umount -l remains the preferred method to regain control of the terminal.

Approach C: Clearing the Dentry Cache

Sometimes the mount itself is functional, but specific files or subdirectories within it are throwing the stale handle error. This happens because the Linux client has aggressively cached the old inodes and directory entries (dentries) and refuses to ask the server for updated ones.

You can force the Linux kernel to drop its directory entry and inode caches.

Warning: This will drop caches for the entire system, causing a temporary but noticeable spike in disk I/O as caches rebuild from disk.

sync; echo 2 | sudo tee /proc/sys/vm/drop_caches

Step 3: Server-Side Prevention

To prevent this issue from recurring, ensure that your /etc/exports file on the NFS server uses static fsid options, especially if you are exporting filesystems that might be unmounted/remounted, like removable block devices or specific logical volumes (LVM).

Example /etc/exports configuration:

/srv/nfs/shared_data  10.0.0.0/24(rw,sync,no_subtree_check,fsid=100)

The no_subtree_check option is also absolutely crucial for stability. Subtree checking has minor security benefits but drastically increases the likelihood of stale file handles when files are renamed within the export. It causes the server to check if a file belongs to the exported subdirectory; if a file is renamed out of that subdirectory, the handle goes stale. It is highly recommended to disable it (which is the default behavior in modern NFS servers).