Have you ever come across the situation where your Linux boot would not mount the root file system, dropping you to a command prompt during initrd processing?
Until recently, I didn’t have to deal with this situation too much and when, I used a manually extended initial RAM disk (AKA “initrd”) to run programs usually not included in that environment. But due to the trouble I had getting a RAID1 LVM environment to boot via a recent SLES update, I had the opportunity to learn how that environment could be used even the limited way it comes.
The basics
For those of you who are not that familiar with initrd, here’s a brief overview of what we’re talking about.
When booting Linux on a typical (pre-UEFI) PC, the process runs in several stages:
- The machine is initialized via the computer’s BIOS
- after detecting a proper boot disk, the first sector of that disk (the so-called boot sector) is loaded and code therein executed
- that code typically analyzes the “partition table” of the boot disk, identifies the partitions marked as “bootable” and starts the OS bootstrap code from that partition. Please note that this “boot-sector code” therefore already needs to be OS-aware: It must be able to recognize the file system on the boot partition and needs to know which files to load/run from that file system. (That’s why Microsoft Windows has a tough time booting Linux partitions, while Linux installs can boot a parallel Windows install from the same machine: MS Windows simply does not have the support code for Linux nor other OS, as they seem to feel the need to lock you in and keep you from booting other OSes. Do I smell fear?)
In case of a Linux system, there’s the kernel to load plus a set of boot files that are loaded into a RAM disk – the so-called initial RAM disk, or “initrd” for short. - Within “initrd”, there’s a program called “init” (typically a shell script) that is run and will set up everything needed to mount the disk-based root file system and run the final “init” program from there.
The main reason for this multi-stage sequence is that you may need specific drivers to operate your hardware and/or root file system properly. “initrd” and the kernel file will have to reside somewhere where the boot loader (i.e. GRUB) can access these files with their limited implementation, which can mean that you’ll have to create a separate (small) boot partition on your disk – in a BIOS-accessible range of your disk and restricted to a file system supported by GRUB (i.e. ext2). Then, within the initrd stage, all required modules to fully support your root file system (i.e. specific hardware drivers or support for your favorite file system) areactivated, the root file system is identified and mounted, and control is handed over to the init program from that disk-based root file system. “Initrd” comes with a very limited set of programs only – that initial RAM disk is kept as small as possible to speed up the boot process, rather than bloating it with unnecessary files.
That initial ram disk is tightly dependent on the kernel it is to be used with, mostly because of the included kernel modules, but properly interacting support scripts as well. To ease the task of creating initrds (which includes jamming the required files in a container format and compressing it as well), a helper program is provided, with quite some logic included now-a-days. To recreate “initrd”, you may simply run the “mkinitrd” command as a root user and voila, new initrd files are created according to the kernels available in /boot and including the support for you current root file system. Usually, this is done for you automagically when you apply kernel updates (or other updates that will affect the content of the initrds).
What can go wrong, may go wrong
Most of the time, all this is without trouble and you’ll never need to explore the content of your initrd. Unless, of course, there’s either some disk space problem, or a kernel upgrade that imposes some incompatibilities with your current configuration.
Unfortunately, there are two points in time when things may take an unwanted turn: During the “mkinitrd” run (so to say, at creation time), and when booting your system using initrd files (in other words, at run time).
Influencing initrd creation
When invoking “mkinitrd” on a joyfully running system, usually the only problem you’ll run into is that of too limited disk space in /tmp or the target folder (typically /boot).
But what if you’re already facing a bad initrd and used some recovery boot medium to start up a rescue system? I, for one, typically need to specify which features to put into the to-be-created initrd, as the rescue system doesn’t provide the environment to auto-detect everything required. My steps to get there, after having booted into a rescue Linux system (i.e. from a current openSUSE USB boot stick) and being logged in as root, are:
- make sure the LVM group carrying the system’s file systems is active
- mount the root file system to /mnt
- mount all other disk-based file systems to their according mount point beneath /mnt
- do not forget to mount the separate boot partition to /mnt/boot
- create copy mounts for /sys and /proc (“mount –bind /proc /mnt/proc; mount –bind /sys /mnt/sys)
- “chroot /mnt”
These comands leave me with a common starting point for all rescue activities, running within my standard server file systems (albeit with a different kernel). To re-create the initrd files, “mkinitrd” can be invoked, but should be monitored closely. If, like in the case I mentioned initially, there are more complex boot setups like “root file system on LVM ontop RAID1”, you may need to manually specify the required features and the device carrying the root file system. I. e.
mkinitrd -d /dev/system/root -f "lvm2 dm md"
so that the required modules are included in the created file. If in doubt, you can simply run “mkinitrd -A” to create an initrd file with all available modules inside, the so call “monster initrd”. It’ll take up quite some space and let’s your system boot more slowly, but as you’re typically trying to recover from a severe problem, that’s a low price to pay to get the system to boot again – and you can later on always re-run “mkinitrd” to create inital RAM disks with only the required modules… once the system is back online.
initrd difficulties at run-time
Sometimes, you do have an initrd file that seems to boot the system, but fails to mount the disk-based root file system. What will happen is that initrd will drop you to a command line prompt, after telling you that the root file system could not be mounted:
invalid root filesystem -- exiting to /bin/sh
Unfortunately, you only have very limited options at this stage: There are close to no typical command line tools included in the initial RAM disk, so you’ll have to improvise. But if all required features where included in the initrd file, but some run-time configuration problem caused the above message, then you have a good chance of recovering from here: All you have to do is to make sure that the root file system’s device becomes available, and then simply leave this shell via “exit”. The initrd init code will retry to mount the root file system (from the now available root device) and, on success, will continue the boot as if nothing has happened.
A typical case for me is when I have to manually set up the RAID1 and activate the volume group, as some change caused the automatic procedure to fail to correctly auto-configure everything for me.
If you’re a sysadmin and not just a drive-by visitor to this blog, I strongly recommend to become familiar with both the creation and the typical content of the initial RAM disk of your Linux distribution – it will pay off when you hit that fatal message from above and are low on time to get your server back online.