Installing openSUSE Leap 42.1 on bcache root FS

Now that openSUSE Leap has been around for some time, I wanted to give it a try on one of our test SAN/NAS servers. One of the most basic elements of these SAN/NAS servers is using bcache to employ transparent block-level caching to SSDs, without the need for specialized RAID adapters. This allows for less expensive HDDs and/or more clients using these machines.

The current servers are running openSUSE 13.1, and using bcache right from the start (i.e. to put the root file system on a bcache device) was not for the faint of heart. So how have things evolved with Leap 42.1?

bcache on openSUSE Leap 42.1

openSUSE Leap 42.1 currently ships with kernel 4.1.12 (and the recent updates push that to 4.1.13), and Vojtech Pavlik (Director SUSE Labs) had reported on the bcache mailing list that SUSE has incorporated the current “patch collection” into their kernel. The latter is good news, as the standard upstream kernel contains bcache in a rather buggy (and old) version. Only recently the maintainer had picked up and forwarded months worth of patches to upstream, where it will be part of kernel 4.4.

Unfortunately, kernel support for bcache isn’t everything. And as it is, the current Leap 42.1 installation images still not only need manual intervention (as the installer is not yet prepared to handle bcache), but don’t even ship essential tools (“make-bcache”, required to set up new devices to be used as bcache cache and backing devices) and come with versions of fundamental subsystems not yet bcache-aware (LVM needs a configuration update to properly work on bcache devices). It has progressed, though, and SUSE will be making an effort to further ease the task of setting up bcache-based configurations. We’ll just have to see how fast this will happen, and to what extend.

Setting up a new system with bcache for root fs

Here’s what will you actually have to do when setting up a new server – it boils down to only two “break points” for manual steps and you won’t have to worry about running out of time, so there’s no need to hurry through steps when following these instructions:

  1. Boot your openSUSE Leap 42.1 installer
    Start your installation with the regular steps required to boot up the shipped installer: Insert your boot media, i.e. some USB stick or DVD, or configure your server’s IPMI module to provide the corresponding ISO as a virtual DVD, power up your server and make sure the installer device is the one you boot from.
  2. Break point 1: License agreement
    When you reach the screen displaying the license agreement, it’s time for the major manual steps to set up bcache. As these are a bit longish, I’ll give a detailed description following this list.
  3. Configure the install and include “bcache-tools” to the list of software packages
    Once the manual bcache setup is completed, you can switch back to the “license screen” and continue your installation. I strongly recommend to select to include the online repositories right from the start – this will make sure you fetch any potentially updated packages. With earlier versions of openSUSE, this was a must to get a working system and wouldn’t hurt now either.
    When setting up the disks and file systems, you likely cannot use the suggested layout (for me, the installer always makes false assumptions because of its bcache-agnostic nature). I recommend to select the “expert configuration” – make sure you delete any extra partitioning that was pre-configured as part of the proposal auto-created by the installer, and then set up your layout as required.
    Additionally, you have to make sure to include “bcache-tools” in your selection of RPMs to install. Without this, you’ll miss the Dracut script that sets up the boot environment to handle your bcache device(s) – hence your root device could not be set up and any boot will fail.
  4. Break point 2: Initial reboot
    Unless you already fixed this while the RPMs were installed to disk (and before the boot loader configuration ran as part of the standard installation routine), you’ll have to suspend the initial reboot and run two steps manually:

    1. Modify the on-disk /etc/lvm/lvm.conf
      Like the installation environment, the on-disk installation has a LVM version that is partly bcache-agnostic. Hence you’ll have to add the “types = [ “bcache”, 16 ]” line to the on-disk “lvm.conf” as well.
    2. Re-run Dracut’s mkinitrd
      Once you’ve fixed LVM, you’ll have to re-run “mkinitrd” so that the initial RAM disk comes with an LVM configuration that enables handling bcache devices.
  5. Reboot and go!

Manually activating bcache during installation / repair

The installation environment (as well as the repair environment from the same ISO) lacks the features to automatically set up and/or activate bcache configurations. Therefore, you’ll have to manually set up your bcache device(s) during installation and, also when booting into repair mode, activate the devices manually.

From what I’ve seen, the installation routines do not yet recognize the actual bcache devices (“/dev/bcache[0-9]*”), thus I decided to set up LVM using “/dev/bcache*” as physical volumes. (Not much I had to re-decide there: I wanted LVM anyhow.) If you want a different setup, i.e. BTRFS, you’ll have to test yourself if the installer will recognize the resulting BTRFS file system and lets you configure your further file system layout. YMMV.

And just for completeness’ sake, I would like to mention that I’m using an MD-RAID6 for my backing device and MD-RAID1 (two SSDs) for the caching device.

So in my case, the initial installation requires the following steps:

  • at the “license screen” of the installer, switch to some text-mode console, i.e. tty2
  • create the backing device, in my case by partitioning the corresponding HDDs and using “mdadm” to create the RAID6 raid set.
  • create the caching device, in my case by partitioning the corresponding SSDs and using “mdadm” to create the RAID1 raid set.
  • fetch the “make-bcache” tool from some other machine
    Unfortunately, the “make-bcache” tool (which is required to prepare block devices to be used as backing and caching devices for bcache) is not included in the installation environment. Therefore, you’ll have to install the “bcache-tools” RPM on some other openSUSE Leap machine and transfer “/usr/sbin/make-bcache” to the machine you’re currently installing. I’m using “scp” for this, but of course, whatever works for you is fine.
  • set up the backing and caching block devices
    Using the “make-bcache” binary that you just transferred to the installation environment, you can prepare both required devices in a single step:
    make-bcache -C /dev/md127 -B /dev/md126
    You’ll likely have to replace above device names to match your situation, and may want to tune bcache using the parameters available. On the other hand, using the defaults has been working sufficiently for me.
  • prepare the file system so that the installation environment’s lvm.conf can be modified
    This step is only required in the installation environment – when you boot into repair mode, you can skip this step and modify /etc/lvm/lvm.conf directly.

    • mv /etc/lvm /etc/lvm.orig
    • mkdir /etc/lvm
    • cp -rp /etc/lvm.orig/. /etc/lvm
  • modify /etc/lvm.conf to be able to handle bcache devices
    • vi /etc/lvm.conf
    • In the “devices” section (which starts at line 33), add a line that reads “types = [ “bcache”, 16 ]”:
     31 # This section allows you to configure which block devices should
      32 # be used by the LVM system. 
      33 devices { 
      34     # added support for bcache devices 
      35     types = [ "bcache", 16 ] 
      36  
      37     # Where do you want your volume groups to appear ? 
      38     dir = "/dev" 
      39
  • load the “bcache” module (“modprobe bcache”)
  • register the bcache backing and cache devices
    The virtual bcache device is made up from a backing device and a cache device. The bcache driver needs to be told which devices to use, so that it will present them as a combined /dev/bcache0 (and successive) device to the system. This is achieved by “registering” the underlying devices via sysfs.
    The device names obviously depend on your setup – for the two MD-RAID devices that I use, the two echo commands need to be

    • “echo /dev/md127 > /sys/fs/bcache/register”
    • “echo /dev/md126 > /sys/fs/bcache/register”
  • Create an LVM volume group
    Both because I prefer it that way, and because the openSUSE installer currently seems to not support bcache devices, creating an LVM volumen group is a way to go.

    • make /dev/bcache0 a “physical volume” (“pvcreate /dev/bcache0”)
    • create a volume group containing that PV (“vgcreate YourPreferredVgName /dev/bcache0”)
      This command will fail silently if you did not change lvm.conf as described above. You may want to invoke “vgdisplay YourPreferredVgName” to verify that the volume group was actually created.

Once these steps are completed, you have successfully prepared your bcache-backed volume group and can switch back to the installation process.

When you’re booting into repair mode, you’ll need only three of the above steps to get access to your bcache-backed volumes:

  • modify lvm.conf
  • load the bcache module
  • register the bcache backing and cache devices

Creating a new initial RAM disk

It is not untypical to leave the system with an insufficient boot environment, most likely because of an incomplete “initrd” (“initial RAM disk”). There’s more than one way to skin a cat, but here’s my preferred way of setting up the repair environment so that calling “mkinitrd” will work:

  • Once you have a command line prompt when booting into repair mode, make the bcache LVM available (modify lvm.conf, load the bcache module, register the bcache backing and cache devices)
  • mount the root FS to /mnt (i.e. “mount /dev/YourPreferredVgName/root /mnt”
  • if you subdivided your file systems, make sure you have /usr, /tmp, /var and /var/log mounted accordingly below /mnt
  • make sure your /boot partition is mounted beneath /mnt/boot
  • bind-mount /sys, /dev and /proc
    mount -o bind /sys /mnt/sys
    mount -o bind /dev /mnt/dev
    mount -o /proc /mnt/proc
  • “change root” to /mnt
    chroot /mnt
  • do your Voodoo to fix any error and re-create initrd by calling “mkinitrd”
  • finally, exit the “chroot” environment( “exit” or Ctrl-D) and umount in reverse order

The future

As bcache has recently picked up some pace again and SUSE has expressed some interest in integrating it better into their distribution, we’ll likely see improvement with future installation images.

The first and easy step is to integrate “bcache-tools” into the installation environment, so that “make-bcache” and the rule sets to activate bcache devices are available and active out of the box.

A second action, needed at the same time as inclusion of “bcache-tools”, is to either update LVM or to extend the default lvm.conf to make LVM bcache-aware.

Personally, I can live with running manual extra steps when setting up a new system, but of course it’d be great if the default installer were bcache-aware, too: A first level would be to present the bcache device in the file system setup screens, so that you can use it like any other block device and create i.e. BTRFS file systems on it by means of the standard installer. The final icing would be full support, as in “being able to configure bcache devices via YaST” and “support creating bcache devices via AutoYaST”.

I believe that the first two changes are easy to implement and won’t break anything – so I will ask SUSE to work on including this with the next release of SUSE Leap (and the SLES versions based on Leap, too). Changing YaST (and AutoYaST) will likely be much more work and is only justified if there’s a substantial amount of users that will want to install with bcached-based root file systems. So if you’re interested in this, please leave a comment below and I’ll let my SUSE contacts know 😉

This entry was posted in howto, Linux, OpenSUSE, SLES (SUSE Linux Enterprise Server), SUSE. Bookmark the permalink.

Leave a Reply