A quick work-around for the NFS mount bug

Sometimes, we’re stuck with older systems. Not because we like ’em, but because our customers need to keep an old version of something for one reason or another. And as a side-effect of this, we’re stuck with old bugs.

One of these oldies is the “NFS page allocation failure” bug in Linux, which manifests as failing mount commands and a syslog message like:

mount.nfs: page allocation failure. order:4, mode:0xd0

We’re running into these quite often on an openSuSE 11.4 system, but have seen these on a more recent openSuSE 12.1, too. On the 11.4 machine the problem usually persists – when we’re hit, we can no longer mount any NFS file systems.

There are two typical scenarios when you may expect this error to occur: One is during auto-mounted file systems, where you simply cannot access the requested file(s) as the mount fails (silently to the user), the other is during direct invocation of the “mount” command on the command line or within some shell script.

By pure chance I’ve come across something that can help as a work-around if you’re struck by this bug during manual or script invocation: Issuing a simple “umount” does fix the situation quite regularily for me!

So when your NFS mount fails, do an umount and retry the mount command. While the umount will of course fail (there’s no mounted filesystem, since the original mount failed), the following mount succeeds:

# mount nfserver:/path/to/share /your/mount/point
[insert your favorite error message here]
# umount /your/mount/point
umount: /your/mount/point: not mounted
# mount nfserver:/path/to/share /your/mount/point
#

Of course, this also works if your mount details are hidden within /etc/fstab and you’re running “mount /your/mount/point” without explicitly mentioning the NFS server.

First of all… your milage may vary. (Then please leave a comment and let me know!)

But secondly, which puzzles me most: Detailed discussions of this bug usually address the point that “mount.nfs” is doing an order 4 allocation, which is considered too much. But if mount.nfs is the source of trouble, then how does invoking “umount” cure this? Remember, the umount is issued against an unmounted directory, there’s no indication to the system that this is to be the location of a future NFS mount.

And thirdly, this all of course does not really help in the auto-mounter scenario. I don’t have enough “sample data” to determine if the single “umount” will cure this until the next reboot, or if the failure may re-occur any time after. I don’t believe that an early “umount” (issued before failures happen) will be of any help, and I haven’t tested if *any* umount will do or if you have to try to un-mount the exact (previously) failing mount point. So if you’re using autofs to mount some user directories from NAS and see the failure messages in syslog, you’ll be better off looking for a true solution to this problem.

Speaking of solutions: It is suggested that setting the Linux kernel compile-time option “CONFIG_NFS_USE_NEW_IDMAPPER” will fix this. And I’ve seen reports that a kernel version 3.3.0 has this issued fixed. Whether that helps, I cannot tell, but I’ve never seen this with a 3.4.x kernel, so there’s hope.

Finally, I’d really like to know from you if issuing the “umount” helped you. Please, if you’re reading this because you ran into the problem and have a chance to test my suggestion, please leave a comment and tell me if it worked. After all, I wouldn’t want to bark up the wrong tree.

This entry was posted in howto, Linux. Bookmark the permalink.

Leave a Reply