"That which is overdesigned, too highly specific, anticipates outcome; the anticipation of outcome guarantees, if not failure, the absence of grace."
-- William Gibson, All Tomorrow's Parties
Attack of the zfsboot Clones.

I have an OpenSolaris box in pilot at the moment, running build 74. It uses Lori Alt's patched miniroot so I can set up a rootpool and do a profile (network, via Jumpstart) install. It works really well.

Yesterday the box went into a reboot loop, and as there appears there are issues with b74, I figured I would finally get around to learning how to use BFU (which is a change-aware wrapper around cpio that writes to /; it's not something you can back out of). But before I did that, I would need to figure how to boot from a ZFS clone. If the BFU goes south, or if the new build bricks the box, I need a way to boot to back into the old system. It's the poor man's LiveUpgrade, I suppose, but it's still way cool and (I think) much easier.

So that's the goal here: Take a snapshot of the current system, clone the snapshot so it's writable, and then upgrade the clone. This way we can BFU the system and still have a fallback in the event that the BFU fails, or the new OS/Net build bricks our box.

Tim Foster had already written a blog post about how easy this was, so I wasn't expected to run into any problems.

First, grab the ON build tools and the BFU archives for the build you care about.


[root@octopus]:[~] cd /tmp
[root@octopus]:[/tmp]# wget http://dlc.sun.com/osol/on/downloads/b75/SUNWonbld.i386.tar.bz2
[root@octopus]:[/tmp]# wget http://dlc.sun.com/osol/on/downloads/b75/on-bfu-nightly-osol-nd.i386.tar.bz2

You probably want to do that in tmp (which is swap) so when you take your snapshots, big random files are not littering the filesystem forever.

Set up your build environment:

[root@octopus]:[/tmp]# bunzip2 on-bfu-nightly-osol-nd.i386.tar.bz2
[root@octopus]:[/tmp]# tar -xf on-bfu-nightly-osol-nd.i386.tar
[root@octopus]:[/tmp]# bunzip2 SUNWonbld.i386.tar.bz2
[root@octopus]:[/tmp]# tar -xf SUNWonbld.i386.tar 
[root@octopus]:[/tmp]# cd onbld/
[root@octopus]:[/tmp/onbld]# pkgadd -d . SUNWonbld 
[root@octopus]:[/tmp/onbld]# cd
[root@octopus]:[~]# export FASTFS="/opt/onbld/bin/i386/fastfs"
[root@octopus]:[~]# export GZIPBIN="/usr/bin/gzip"
[root@octopus]:[~]# export BFULD="/opt/onbld/bin/`uname -p`/bfuld"
[root@octopus]:[~]# export PATH="/opt/onbld/bin:/opt/onbld/bin/`uname -p`:$PATH"

Now we need to take a snapshot of our current rootfs, clone it is writable, and mount it. In my setup, the rootpool is a legacy mount, and anything under it is also going to inherit the legacy mount property.


[root@octopus]:[~]# zfs snapshot rootpool/b74@upgrade
[root@octopus]:[~]# zfs clone rootpool/b74@upgrade rootpool/b75
[root@octopus]:[~]# zfs set mountpoint=/rootpool/b75 rootpool/b75

Now it's time to do the actual upgrade. I ran into two very minor snags here. First, I don't have BIND installed, so I needed to pass -f to bfu. Secondly, I don't have D-BUS installed, and had to comment that check out of the bfu script. Once that's done, it goes off and does it's thing happily.

Once the BFU finished you'll be put into a safe environment with tools built to work regardless of how horribly the BFU may have messed up your system (not an issue here, as we aren't actually modifying our current rootfs). As soon as it's done, you'll need to resolve the conflicts it lists; thus far I have not had an issue with using Automated Conflict Resolution to merge those files.


[root@octopus]:[~]# bfu -f /tmp/archives-nightly-osol-nd/i386 /rootpool/b75
bfu# /opt/onbld/bin/acr /rootpool/b75

And that's it. Your clone has now been upgraded using BFU. Create a boot archive of the new BE and set it legacy again.


[root@octopus]:[~]# bootadm archive-update -R /rootpool/b75
[root@octopus]:[~]# zfs set mountpoint=legacy rootpool/b75

You have a couple options for managing your boot environments at this point. You can either modify /rootpool/boot/grub/menu.lst yourself, or use Tim Foster's zfs-bootadm.sh to do it for you. The script relies on a property to determine which zfs fs are bootable, so you'll need to set that.


[root@octopus]:[~]# ./zfs-bootadm.sh
Usage: zfs-bootadm.sh [command]

where command is one of:
create
Creates a new bootable dataset as a clone
of the existing one.
activate
Sets a bootable dataset as the next
dataset to be booted from.
destroy
Destroys a bootable dataset. This must not
be the active dataset.
list
Lists the known bootable datasets.

[root@octopus]:[~]# zfs set bootable:=true rootpool/b75
[root@octopus]:[~]# ./zfs-bootadm.sh list
b74 (current)
b75
test
[root@octopus]:[~]# ./zfs-bootadm.sh activate b75
Currently booted from bootable dataset rootpool/b74
On next reboot, bootable dataset rootpool/b75 will be activated.
[root@octopus]:[~]# reboot

The box reboots, and...


[bda@moneta]:[~]$ ssh root@octopus
Last login: Fri Nov 16 02:50:21 2007 from 10.10.1.20
Sun Microsystems Inc. SunOS 5.11 snv_75 October 2007
bfu'ed from /tmp/archives-nightly-osol-nd/i386 on 2007-11-16
Sun Microsystems Inc. SunOS 5.11 snv_74 October 2007
[root@octopus]:[~]# uname -a
SunOS octopus 5.11 snv_75 i86pc i386 i86pc

Pretty dang cool stuff!

My initial test here was to BFU from b74 to b76. After some fumbling about with where the menu.lst file was (I knew it was stored on the rootpool from reading Lori Alt's weblog and various presentations, but rootpool was a legacy mount, so stupid tired me was confused for a good ten minutes). The BFU and acr itself appeared to be fine, and when I finally got the BE to boot, it... panicked.

I was somewhat discouraged, but booted right back into b74 and BFU'd happily to b75.

Which was the entire point of the exercise: To upgrade the system and have a safe way to fall back to a previous build if the system becomes unusable. As I said, it's the poor man's LiveUpgrade, but LU doesn't currently support zfsboot. And, really, this just seems much quicker and easier to deal with.

There are plenty of little things to figure out still (like which filesystems are required to be on the BE for the BFU to work, so I don't end up with data being snapshotted forever), how to deal with package upgrades, and the the like. But overall... very, very cool.

Another thing to note is that everything above was gleaned not just from documentation but from the blogs of the developers.

November 16, 2007 11:19 AM