Turning it off and on again

Linux initrd command line

Posted on January 5, 2014 by j mozdzen

Have you ever come across the situation where your Linux boot would not mount the root file system, dropping you to a command prompt during initrd processing?

Until recently, I didn’t have to deal with this situation too much and when, I used a manually extended initial RAM disk (AKA “initrd”) to run programs usually not included in that environment. But due to the trouble I had getting a RAID1 LVM environment to boot via a recent SLES update, I had the opportunity to learn how that environment could be used even the limited way it comes.

The basics

For those of you who are not that familiar with initrd, here’s a brief overview of what we’re talking about.

When booting Linux on a typical (pre-UEFI) PC, the process runs in several stages:

The machine is initialized via the computer’s BIOS
after detecting a proper boot disk, the first sector of that disk (the so-called boot sector) is loaded and code therein executed
that code typically analyzes the “partition table” of the boot disk, identifies the partitions marked as “bootable” and starts the OS bootstrap code from that partition. Please note that this “boot-sector code” therefore already needs to be OS-aware: It must be able to recognize the file system on the boot partition and needs to know which files to load/run from that file system. (That’s why Microsoft Windows has a tough time booting Linux partitions, while Linux installs can boot a parallel Windows install from the same machine: MS Windows simply does not have the support code for Linux nor other OS, as they seem to feel the need to lock you in and keep you from booting other OSes. Do I smell fear?)
In case of a Linux system, there’s the kernel to load plus a set of boot files that are loaded into a RAM disk – the so-called initial RAM disk, or “initrd” for short.
Within “initrd”, there’s a program called “init” (typically a shell script) that is run and will set up everything needed to mount the disk-based root file system and run the final “init” program from there.

The main reason for this multi-stage sequence is that you may need specific drivers to operate your hardware and/or root file system properly. “initrd” and the kernel file will have to reside somewhere where the boot loader (i.e. GRUB) can access these files with their limited implementation, which can mean that you’ll have to create a separate (small) boot partition on your disk – in a BIOS-accessible range of your disk and restricted to a file system supported by GRUB (i.e. ext2). Then, within the initrd stage, all required modules to fully support your root file system (i.e. specific hardware drivers or support for your favorite file system) areactivated, the root file system is identified and mounted, and control is handed over to the init program from that disk-based root file system. “Initrd” comes with a very limited set of programs only – that initial RAM disk is kept as small as possible to speed up the boot process, rather than bloating it with unnecessary files.

That initial ram disk is tightly dependent on the kernel it is to be used with, mostly because of the included kernel modules, but properly interacting support scripts as well. To ease the task of creating initrds (which includes jamming the required files in a container format and compressing it as well), a helper program is provided, with quite some logic included now-a-days. To recreate “initrd”, you may simply run the “mkinitrd” command as a root user and voila, new initrd files are created according to the kernels available in /boot and including the support for you current root file system. Usually, this is done for you automagically when you apply kernel updates (or other updates that will affect the content of the initrds).

What can go wrong, may go wrong

Most of the time, all this is without trouble and you’ll never need to explore the content of your initrd. Unless, of course, there’s either some disk space problem, or a kernel upgrade that imposes some incompatibilities with your current configuration.

Unfortunately, there are two points in time when things may take an unwanted turn: During the “mkinitrd” run (so to say, at creation time), and when booting your system using initrd files (in other words, at run time).

Influencing initrd creation

When invoking “mkinitrd” on a joyfully running system, usually the only problem you’ll run into is that of too limited disk space in /tmp or the target folder (typically /boot).

But what if you’re already facing a bad initrd and used some recovery boot medium to start up a rescue system? I, for one, typically need to specify which features to put into the to-be-created initrd, as the rescue system doesn’t provide the environment to auto-detect everything required. My steps to get there, after having booted into a rescue Linux system (i.e. from a current openSUSE USB boot stick) and being logged in as root, are:

make sure the LVM group carrying the system’s file systems is active
mount the root file system to /mnt
mount all other disk-based file systems to their according mount point beneath /mnt
do not forget to mount the separate boot partition to /mnt/boot
create copy mounts for /sys and /proc (“mount –bind /proc /mnt/proc; mount –bind /sys /mnt/sys)
“chroot /mnt”

These comands leave me with a common starting point for all rescue activities, running within my standard server file systems (albeit with a different kernel). To re-create the initrd files, “mkinitrd” can be invoked, but should be monitored closely. If, like in the case I mentioned initially, there are more complex boot setups like “root file system on LVM ontop RAID1”, you may need to manually specify the required features and the device carrying the root file system. I. e.

mkinitrd -d /dev/system/root -f "lvm2 dm md"

so that the required modules are included in the created file. If in doubt, you can simply run “mkinitrd -A” to create an initrd file with all available modules inside, the so call “monster initrd”. It’ll take up quite some space and let’s your system boot more slowly, but as you’re typically trying to recover from a severe problem, that’s a low price to pay to get the system to boot again – and you can later on always re-run “mkinitrd” to create inital RAM disks with only the required modules… once the system is back online.

initrd difficulties at run-time

Sometimes, you do have an initrd file that seems to boot the system, but fails to mount the disk-based root file system. What will happen is that initrd will drop you to a command line prompt, after telling you that the root file system could not be mounted:

invalid root filesystem -- exiting to /bin/sh

Unfortunately, you only have very limited options at this stage: There are close to no typical command line tools included in the initial RAM disk, so you’ll have to improvise. But if all required features where included in the initrd file, but some run-time configuration problem caused the above message, then you have a good chance of recovering from here: All you have to do is to make sure that the root file system’s device becomes available, and then simply leave this shell via “exit”. The initrd init code will retry to mount the root file system (from the now available root device) and, on success, will continue the boot as if nothing has happened.

A typical case for me is when I have to manually set up the RAID1 and activate the volume group, as some change caused the automatic procedure to fail to correctly auto-configure everything for me.

If you’re a sysadmin and not just a drive-by visitor to this blog, I strongly recommend to become familiar with both the creation and the typical content of the initial RAM disk of your Linux distribution – it will pay off when you hit that fatal message from above and are low on time to get your server back online.

Posted in howto, Linux | Leave a comment

initrd woes with SLES11 updates

Posted on January 3, 2014 by j mozdzen

As the holiday season is a time of few active users, it’s considered a good time for systems maintenance. Updating a cluster of SLES servers is no exception to this, and so we took a few moments to bring a set of servers to the latest level of SLES 11 SP3.

Interestingly, unlike with earlier updates, running the command line update (“zypper up”) reported problems with “mkinitrd”:

Installation of drbd-kmp-default-8.4.4_3.0.101_0.8-0.20.1 failed:
 (with --nodeps --force) Error: Subprocess failed. Error: RPM failed:
 Kernel image:   /boot/vmlinuz-3.0.101-0.8-default
 Initrd image:   /boot/initrd-3.0.101-0.8-default
 KMS drivers:     radeon
 Root device:    /dev/mapper/system-root (mounted on / as ext3)
 Resume device:  /dev/md2
 Device disk!by-id!md-uuid-e76f9f53:b7553f91:5ed64d64:e65d3ab5 not found in sysfs
 Script /lib/mkinitrd/setup/72-block.sh failed!
 There was an error generating the initrd (1)
 error: %post(drbd-kmp-default-8.4.4_3.0.101_0.8-0.20.1.x86_64) scriptlet failed, exit status 1
 Abort, retry, ignore? [a/r/i] (a):

This wasn’t the only place “mkinitrd” is called, but interestingly it was the only place where it caused the update process to fail – go figure… and always check your update logs!

Running “mkinitrd” manually reported the same difficulties, which is not much of a surprise:

server01:/boot/grub # mkinitrd
 Kernel image:   /boot/vmlinuz-3.0.101-0.8-default
 Initrd image:   /boot/initrd-3.0.101-0.8-default
 KMS drivers:     radeon
 Root device:    /dev/mapper/system-root (mounted on / as ext3)
 Resume device:  /dev/md2
 Device disk!by-id!md-uuid-e76f9f53:b7553f91:5ed64d64:e65d3ab5 not found in sysfs
 Script /lib/mkinitrd/setup/72-block.sh failed!
 There was an error generating the initrd (1)
 server01:/boot/grub #

But what’s all this about? The disruption is obviously caused by the “missing” device (“Device disk!by-id!md-uuid-e76f9f53:b7553f91:5ed64d64:e65d3ab5 not found in sysfs”), so here’s some background on the setup:

The servers all come with a local disk plus a Fiber Channel connection to a SAN server. In order to provide optimum availability, the local disk’s partitions are mirrored to a similarily partitioned SAN LUN via Linux MD. There are three partitions, one for /boot, one as a LVM physical volume, and lastly a swap partition for sake of completeness. These partitions (from both the local disk and the LUN) are used to create /dev/md0, /dev/md1 and /dev/md2.

The device in question is /dev/md1, as we could confirm by looking at /dev/disk/by-id:

server0103:~ # ls -l /dev/disk/by-id/md-uuid-*3ab5
lrwxrwxrwx 1 root root 9 Jan  3 16:43 /dev/disk/by-id/md-uuid-e76f9f53:b7553f91:5ed64d64:e65d3ab5 -> ../../md1

So it was the LVM “physical volume” that somehow caused problems… but why? Up to now, everything went quite smoothly, both during uptime and for previous upgrades.

The riddle’s solution can be found in /etc/lvm/lvm.conf: These servers handle a large number of dynamically attached LUNs that carry file systems for Xen virtual machines. But to keep the Dom0 (“host”) LVM from picking up volume groups on these additional disks, the servers where configured to only use the specific RAID1 created from the two dedicated partitions:

[... /etc/lvm/lvm.conf ...]
# we know what we're looking for
filter = [ "a|/dev/disk/by-id/md-uuid-e76f9f53:b7553f91:5ed64d64:e65d3ab5|", "r|.*|" ]

It seems that the mkinitrd scripts somehow catch that information and try to look up the named device in sysfs. Why it doesn’t pick up the obviously existing device has yet to be determined, but as a quick work-around we modified lvm.conf to use the (in our case persistent) generic device name:

[... /etc/lvm/lvm.conf ...]
# we know what we're looking for
#filter = [ "a|/dev/disk/by-id/md-uuid-e76f9f53:b7553f91:5ed64d64:e65d3ab5|", "r|.*|" ]
filter = [ "a|/dev/md1|", "r|.*|" ]

Once that change was in effect, both “mkinitrd” and the whole update could be completed.

Posted in Linux | Leave a comment

Brother DCP-J925DW: Problems when scanning via SANE

Posted on December 6, 2013 by j mozdzen

We’ve had pretty good experiences when using Brother printers and scanners in our Linux world – Brother is one of those companies that do not treat their Linux customers as second class. It was quite obvious we’d put Brother devices to the top of our list when shopping for new hardware.

So when we needed a printer/scanner combo for low-volume printing, some occasional scan jobs (but with automated document feed) and WLAN capabilities, we decided to try Brother’s DCP-J925DW device: It’s an ink jet printer with four separate ink tanks (CMYK), does even allow for the occasional printing of a CD/DVD label (right on the printable medium, without having to handle separate label stickers), comes with a scanner with ADF and is WLAN-capable. Continue reading →

Posted in CUPS, Linux, SANE | 4 Comments

Linux initrd command line

The basics

What can go wrong, may go wrong

Influencing initrd creation

initrd difficulties at run-time

initrd woes with SLES11 updates

Brother DCP-J925DW: Problems when scanning via SANE

Recent Posts

Recent Comments

Archives

Categories

Meta