"That which is overdesigned, too highly specific, anticipates outcome; the anticipation of outcome guarantees, if not failure, the absence of grace."
-- William Gibson, All Tomorrow's Parties
OpenBSD Software RAIDs

Spent yesterday fighting with our crappy backup staging server (where things go before they're taped, and stay live for a period of time) at work. The thing is a junk Gateway "server" box that was super cheap (always a prevailing concern for purchasing hardware there if we can't offset the cost somehow), but has since proven to be an enormous pain in the butt (get what you pay for).

The machine has, at various points, had its motherboard replaced, its RAM, its CPU, and finally I get the thing working, and the primary IDE bus blows out. Pretty awesome, but a minor fix, as it has three (getting that third one to work is something I really should have documented; it was a pain, iirc).

So initially the machine was running Linux with Reiserfs on a software RAID across three IDE disks on dedicated busses. Slow as hell, but it worked.

So the reiser journals blew their trees all over the place, and since the data is taped anyway, I figured I'd give OpenBSD's software RAID (RAIDFrame, ported from NetBSD) a whirl.

I unplugged the RAID disks (habit) so I wouldn't get confused during the install ( a good habit), pointed the installer at an FTP mirror and went and did other work for a while.

After the machine installed itself (not counting download times, about 10 minutes of work... Much less-than-three to OBSD) I recompiled the kernel, pinning wd0 to the first channel of the secondary IDE bus (it wanted to boot off the , and started fighting with getting the machine to get the third IDE bus recognized (the BIOS and bootloader saw the drive on it fine, but the kernel refused to see it). I spent an hour or so trying to figure out how to get the bus's device attributes (what device number it is, etc), and failed pretty badly. I don't remember how I got it for the previous Linux install, though I did try booting Debian as I have vague recollections of the default Debian kernel seeing it.

Eventually gave up on that and threw another PCI IDE card in the machine. Pinned the RAID disks in place (config -e with -o is pretty great) so they couldn't move around on me ever, and started setting up the array as described in raidctl(8). Pretty simple stuff, though I have to admit that it seemed odd (at first) that I needed to have a FS_RAID type disklabel on the array's drives.

Get the RAID device formatted, mount it, and... it's somehow managed to lose about 100GB of space. There are three drives: two 160s, and one 120. So there should be somewhere in the vicinity of 410GB usuable space, 440 total.

Machine would only see 330GB tops. I pulled the 120 out, fed it another 160, rebuild the array... yeah. Pulled the 160 out, build the array up, and it would only see 300... There's an error stating that it's "truncating" the last disk, but googling for the warning returns nothing.

Next stop is mailing lists and looking at the raidframe code to see what causes it to happen.

Obviously I'm doing something incredibly stupid here. My initial thought was block size, but... That seems unlikely considering the amount of space involved here.

So yeah. If anyone has any ideas on this one, I'd appreciate it before I just reinstall Linux on the box (tomorrow, heh, as I'm tired of not having an easy place to do backups to).

June 8, 2004 10:32 AM
Comments

I suppose I should have mentioned it was RAID 0, so there shouldn't be any space lost save to filesystem overhead, super blocks, etc.

Posted by: bda at June 8, 2004 10:44 AM

[root@anubis]:[~]# cat /etc/raid0.conf
START array
# numRow numCol numSpare
1 3 0

START disks
/dev/wd1a
/dev/wd2a
/dev/wd3a

START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
64 1 1 0

START queue
fifo 100

Posted by: bda at June 8, 2004 11:55 AM

[root@anubis]:[~]# raidctl -u raid0
[root@anubis]:[~]# raidctl -C /etc/raid0.conf raid0

Jun 8 11:58:56 anubis /bsd: raid0 detached
Jun 8 11:59:02 anubis /bsd: raid0: Component /dev/wd1a being configured at row: 0 col: 0
Jun 8 11:59:02 anubis /bsd: Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
Jun 8 11:59:02 anubis /bsd: Version: 2 Serial Number: 106 Mod Counter: 207
Jun 8 11:59:02 anubis /bsd: Clean: Yes Status: 0
Jun 8 11:59:02 anubis /bsd: raid0: Component /dev/wd2a being configured at row: 0 col: 1
Jun 8 11:59:02 anubis /bsd: Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
Jun 8 11:59:02 anubis /bsd: Version: 2 Serial Number: 106 Mod Counter: 207
Jun 8 11:59:02 anubis /bsd: Clean: Yes Status: 0
Jun 8 11:59:02 anubis /bsd: raid0: Component /dev/wd3a being configured at row: 0 col: 2
Jun 8 11:59:02 anubis /bsd: Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
Jun 8 11:59:02 anubis /bsd: Version: 2 Serial Number: 106 Mod Counter: 207
Jun 8 11:59:02 anubis /bsd: Clean: Yes Status: 0
Jun 8 11:59:02 anubis /bsd: WARNING: truncating disk at r 0 c 2 to 320172929 blocks.
Jun 8 11:59:02 anubis /bsd: raid0 (root)

Posted by: bda at June 8, 2004 12:00 PM

Hm. The truncating error appears to have something to do with offsets on the drives, I think...

wd1:
a: 320172993 63 RAID # (Cyl. 0*- 317631)
c: 320173056 0 unused 0 0 # (Cyl. 0 - 317631)

wd2:
a: 320172993 63 RAID # (Cyl. 0*- 317631)
c: 320173056 0 unused 0 0 # (Cyl. 0 - 317631)

wd3:
a: 320173056 0 RAID # (Cyl. 0 - 317631)
c: 320173056 0 unused 0 0 # (Cyl. 0 - 317631)

Hmmm!

Posted by: bda at June 8, 2004 12:17 PM

Killed the array, changed the offsets of the first two disks (to start at 0 as opposed to 63, using the "b" command in the disklabel editor -- I'm too lazy to use CLI switches, so far), built the array... formatted, and...

/dev/raid0a 450G 2.0K 428G 0% /mnt

Much better. :)

Posted by: bda at June 8, 2004 12:54 PM

I should probably be using ccd for this, as it's RAID 0, but... oh well. The kernel is already compiled and RAIDframe seems quite happy. So good on me.

It's always the simple things that mess stuff up.

Posted by: bda at June 8, 2004 1:08 PM
Post a comment









Remember personal info?