"That which is overdesigned, too highly specific, anticipates outcome; the anticipation of outcome guarantees, if not failure, the absence of grace."
-- William Gibson, All Tomorrow's Parties
August 11, 2004

Got bored today and decided to install Solaris 10 beta 5 on some boxes. Keeping in mind that my experiences with commercial UNIX has always left a sour taste in my mouth (IRIX, AIX), and that I have very specific ideas about what UNIX is, you greybeards may want to take this with a shot of J.D. or something. Also keep in mind that this is beta software.

February 21, 2006

Installed Solaris 10 on my Dell PowerEdge 1400SC the other day. Just got around to logging into it. Man, I don't remember anything about Solaris except ps -ef. Pretty sad. It took a damn long time to get installed, and did not enjoy playing with my cheaper, far more generic, bitch system. Had to swap the PE out of being a fileserver so I could get Solaris installed.

Going to be playing with the Sun LDAP server, methinks. If Solaris Zones actually did virtualization, I would consider using them for some virtual server stuff I'm going to need to do, but pity, they aren't. Still, they're pretty awesome. I just don't know what I'd want to give someone a shell on a Solaris box for. ;-)

(Note: Image does not actually inspire confidence.)

March 25, 2006

I guess almost two months ago now I started playing around with Solaris 10. I spent a lot of time reading up on it, and even ordered a SunFire X2100 because I figured I might actually want to run it in production (things like Zones, DTrace, SMF, etc, just are that awesome). I probably wouldn't have thought of getting the SunFire, but mdxi was very positive about his experience with the machine.

As I get older, I seem to notice the noises computers in my room makes more and more. Whenever my fileserver (which is across the room -- but it's a really small room) runs it cron jobs at night (which are all very heavy I/O) it sounds like a small motor is chunking through a duck or something. Last night I finally got tired enough of it that I cleaned out my closet with the intention of moving any machines that don't require a display in there. I also figured I would take the opportunity to do something productive with Solaris 10: Replace my current OpenBSD Samba server.

February 5, 2007

For the last week and a half I've been learning up on Solaris 10 again. The last time I touched it, about a year ago, I was just screwing around with no real interest in using it in a production environment. After reading a few posts over at Theo Schlossnagle's blog regarding choosing Solaris over Linux and his OSCON slides relating to the same, both relating to PostgreSQL performance, I became much more interested in Solaris 10.

(hdp made noises about how evidently the company Schlossnagle works for wrote OmniMTA, which is what the Cloudmark Gateway product uses, among other things; evidently it's a small enough world, after all.)

We have a service at work which stores spam for 30 days. We refer to the messages as "discards", because the system has decided you probably don't want to see them, but it's not like we're going to drop the things on the floor. The thing is, it's insanely slow, right to the very edge of usability (and probably beyond for the vast majority of people). Getting results out of the database takes minutes.

There are a number of issues with the system as a whole, but evidently Postgres configuration is not one of them (jcap, my predecessor, set the service up properly, and a PgSQL Guy agreed there wasn't much else could be done on the service end). So that leaves hardware and OS optimizations. The hardware is fine, save for the fact it's running on metadisk, which is running on SATA (read: bloody slow, especially for PgSQL, which is a pig for disk I/O). We'll be fixing that with a SCSI JBOD and a nice SCSI RAID card RSN. The OS is Linux, and has also been optimized... to a point. Screwing with the scheduler would probably get me something. However, based on my own research (I've read both the Basic Administration and Advanced Administration books over at docs.sun, as well as numerous websites, etc), and Schlossnagle's posts, I've made up my mind that Solaris is the way to go here. So what sold me?

Well, there's the standard features all new to Solaris 10:

  • ZFS
  • StorageTek Availability Suite (I can't seem to get away from network block devices... we use DRBD right now, and frankly I've really come to hate it; but the basic idea is sound enough and far too useful to ignore)
  • Fault Management
  • Zones (not very useful to me in this case)
  • Service Management Facility (while not a deal-breaker or maker, it's incredibly nice being able to define service dependencies and milestones, it also ties into FM)
  • DTrace (for me, this is a deal-maker; check out the DTraceToolkit for examples why, compared to debugging problems under Linux, it's a huge win for any admin)
  • Trusted Extensions (while really interesting and hardcore, not something I care much about just yet)
  • Stability (not only in terms of the system itself, but the APIs, ABIs, and the like; you can use any device driver written for the last three major versions of Solaris in 10 -- compare not only the technology there, but the philosophy behind it, to any freenix)
  • RBAC (while not something I'm going to use immediately, it's something that I really want to utilize moving forward)

That's a fair feature-set that should get any admin to perk up and take notice. Of course, if it weren't for OpenSolaris I wouldn't care. Solaris 8 and 9 are sturdy and well-known systems, but I have no interest in them. They don't get me anything except service contracts and annoying interfaces. With OpenSolaris, Sun is actively making progress in the same friendly directions freenixes have always tried for -- while adding some seriously engineered and robust tech into the mix. It's a nice change. A more open development model, with lots of incremental releases (building into an official Solaris 10 release every six months or so) give me the warm fuzzies.

So, now that the advertisement is out of the way, what are my impressions after a week of configuring and using it?

Well, Solaris with great new features is still Solaris. Config files are in strange places for legacy reasons, there are symlinks to binaries (or binaries) in /etc, SunSSH is ... well. SunSSH (perhaps sometime soon they'll just switch over to OpenSSH and all the evil politicking can be forgotten, yes?). /home is not /home because it's really /export/home.

Commands exist in odd locations that aren't in my path by default, logging is strange. In short, it's a commercial UNIX. It's steeped in history and the reasons for things are not always immediately clear. The documentation (both from docs.sun, OpenSolaris, and the man pages) is excellent. I am not coming to Solaris as a total newb. I've used it before, but not particularly extensively; the learning curve is expectedly high.

As always, UNIX is UNIX. Nothing changes but the dialect and where they put the salad fork.

So, I've got this core system that does lots of really great stuff, some of which is confusing and maybe not so great, but overall it's a pretty obvious win. Unfortunately it has a bunch of tools I'm not used to, or don't like, and it lacks a lot of tools I require. So I need to go out and find a third-party package management utility. Well, you've got Sun Freeware, which is pretty basic. There's Blastwave, which has a large repository of software, a trivial way of interfacing with it all, but seems to have some QA issues (that's an old impression and may have become invalidated).

And then there's pkgsrc, the NetBSD Ports System. And you know what? It's pretty great. After bootstrapping pkgsrc's gcc from Sun Freeware's (Sun packages gcc now, so you have access to a compiler with the OS -- this was not true before -- but apparently Sun's gcc is Weird and not to be trusted), I was building packages with no issues whatsoever. OpenSSH, Postfix, PgSQL, vim7... Anyway, with an hour's worth of work (which only ever need be done once, on one system, to build the packages), you've got all the programs you're used to using, or require. Suddenly the weird and craggy vastness of Solaris -- expat from the world of commercial UNIX -- becomes much more friendly and livable.

A couple simple hints about your environment: Set TERM=dtterm and add TERMINFO=/usr/share/lib/terminfo. The former seems to be the proper shell for xterm or xterm-like terminals, and the latter fixes pgup/pgdown in pagers and vim, though not in Sun's vi.

It's also easy to create your own packages -- something we've been wanting to do at work for a long time (before I started, certainly). Moving our current custom "packaging" system to pkgsrc would be tedious, but certainly something we could automate with some work. Standardizing on it would be a big win not just for the Solaris servers, but for the architecture as a whole. So, a double win.

(I would be remiss not to mention Nexenta, a project which merges GNU software into OpenSolaris's base. It's very, very interesting, especially in that they use Ubuntu's repos, but regardless of the purported stability of their alpha releases, I can't say I am very interested in running it on my servers. Still, it's definitely something someone who wants to give Solaris 10 a poke without too much effort should take a look at. The same way that Ubuntu is there for people who want to try out Linux. I imagine, frankly, that eventually they will occupy the same ecological niche.)

As you might have guessed, I'm quite happy with my week and change of testing. All the basic management is similar to my BSD experience, and the vast wealth of information I can trivially get out of the system compared to other UNIXes makes it hard to argue against. pkgsrc means not only I, but our developers, have access to everything they need or are used to. The default firewall is ipf, which I'm not thrilled about (pf is my bag), but is certainly usable, and no doubt an improvement over whatever came before.

My next step is to take a Reduced Networking installation and build it up into a Postgres server running Slony-1 for replication services. I expect it to go pretty smoothly. The step after that will be a JumpStart server to live next to my (now beloved) FAI server.

There are a few things I need to pursue before we roll to live testing, including root RAID (the X2100's nvidia "RAID" is not supported by Solaris 10, weirdly enough). ZFS root is apparently possible, though tedious and weird. It would give me a trivial way to do mirroring, though. A install server would probably make it easier to do (though that's just a guess). Barring that, I'm guessing that a ZFS pool for system volumes (/usr, /var) and a ZFS pool for data/applications would be good enough. Mirroring / in the typical way (which certainly appears to be simple), until ZFS on root becomes common and supported.

I expect I'll drop some more updates as I move forward. Hopefully with good news for our PgSQL performance. ;-)

<bda> "Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS will detect it and use another copy to repair it."
* bda sighs.
<bda> It's so dreamy.
<kitten-> You really need a girlfriend.
<bda> I doubt she'd come with fault management and a scriptable kernel debugger.
<kitten-> I suppose you're right.

February 8, 2007

The LSI SCSI card for our new discards database server (Sun X2100 M2) came in today. I ran up to UniCity to get some cables from Harry (thanks, Harry!) and plugged the Dell PowerVault 210S we got into the thing. All the disks showed up happily in cfgadm -al, but not in format. Telling cfgadm to configure didn't seem to do much for me, so I decided to be lazy and touch /reconfigure and reboot. Once that was done, all seemed to be quite happy.

The goal here is to create a ZFS pool on the JBOD for Postgres to live on. As you can see, it is super easy:

[root@mimas]:[~]# zpool create data2 raidz c3t0d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0 c3t13d0
[root@mimas]:[~]# zpool status
pool: data
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
c1d0p3 ONLINE 0 0 0
c2d0p3 ONLINE 0 0 0

errors: No known data errors

pool: data2
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data2 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t8d0 ONLINE 0 0 0
c3t9d0 ONLINE 0 0 0
c3t10d0 ONLINE 0 0 0
c3t11d0 ONLINE 0 0 0
c3t12d0 ONLINE 0 0 0
c3t13d0 ONLINE 0 0 0

errors: No known data errors
[root@mimas]:[~]# zfs create data2/postgresql
[root@mimas]:[~]# zfs set mountpoint=/var/postgresql2 data2/postgresql
[root@mimas]:[~]# mv /var/postgresql/* /var/postgresql2/
[root@mimas]:[~]# zfs umount data/postgresql
[root@mimas]:[~]# zfs destroy data/postgresql
[root@mimas]:[~]# zfs umount data2/postgresql
[root@mimas]:[~]# zfs set mountpoint=/var/postgresql data2/postgresql
[root@mimas]:[~]# zfs mount data2/postgresql
[root@mimas]:[~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 885M 124G 24.5K /data
data/pkg 885M 124G 885M /usr/pkg
data2 17.1G 346G 44.8K /data2
data2/postgresql 17.1G 346G 17.1G /var/postgresql

RAID-Z is explained in the zpool man page.

While I am a little worried that we really want hardware RAID, I didn't want to spend the extra cash. If we do end up needing the extra performance, trust me, the system can be repurposed easily. ;-)

woot.

February 10, 2007

In which our intrepid sysadmin may as well have been eating wall candy as doing work for all it got him.

I spent all of Friday and most of last night trying to figure out why pgbench was giving me what seemed to be totally insane results.

  • >1000 tps on SATA 3.0Gb/s (Linux 2.6, XFS, metadisk + LVM)
  • ~200 tps on SATA 3.0Gb/s (Solaris 10 11/06, UFS)
  • ~500 tps on SATA 3.0Gb/s (Solaris 10 11/06, ZFS mirror)
  • <100 tps on 10x U320 SCSI (Solaris 10 11/06, ZFS RAID-Z)

Setting recordsize=8192 for the ZFS volume is helpful. Or whatever your PostgreSQL blocksize is compiled with.

It basically came down to Postgres fsync() absolutely murdering ZFS's write performance. I couldn't understand why XFS would perform 10 times as well as ZFS on blocking writes. Totally brainlocked.

And of course the bonnie++ benchmarks were exactly what I expected: the SCSI array was kicking ass all over the place... except with -b enabled, while the lone SATA drive just sort of sucked air next to it on any OS/filesystem.

The zpool iostat <pool> <interval> command is seriously useful. Give it -v to get per-disk stats.

Anyway, I was banging my head against the wall literally the entire day. The ticket for "Test PgSQL on Sol10" is more pages long than I'd like to think about. Finally I bitched about it in #slony, not out of any particular thought that I'd get an answer, just frustration. mastermind pointed out that, hey, cheap SATA drives have, you know... write caches. And like? SCSI drives? They have it like, turned off by default, because reliability is far more important than performance for most people. (Valley-girlese added to relate how stupid I felt once he mentioned it.) And anyway, that's what battery-backed controllers are for: So you get the perf and reliability.

Once he said "write cache", it clicked, total epiphany, and I felt like a complete jackass for spending more than ten hours fighting with what amounts to bad data. In the process I read a fair amount on good database architecture, though, and that will be very helpful in the coming weeks for making our discards database(s) not suck.

Live and learn, I suppose.

The whole experience has really driven home the point that getting information out of a Solaris box is so much less tedious than a Linux box, though. While I was under the impression that the SATA drives were not actually lying to me about their relative performance, I kept thinking "These Sol10 tools are so frakking nice, I don't want to get stuck with Linux tools for this stuff!" Especially the ZFS interaces.

February 22, 2007

OpenSolaris/Solaris Relationships

A useful entry for people who are confused by how OpenSolaris turns into Solaris 10, and what the differences actually are.

Mercurial: a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.

dragorn pointed this out the other day. After having to endure the fist-shaking of my co-workers at svn/svk perhaps they will be satiated. Or not.

OpenSolaris also uses hg, it seems.

March 14, 2007

So as part of freeing up some rackspace at work, I'm throwing a bunch of systems into Solaris Zones. However, some of these systems, while not "mission critical" are pretty important and their IP addresses really shouldn't change (DNS propagation lag would suck).

So my Solaris Zones box is sitting on one our subnets at the colo, the one with the most free addresses. Two of these other systems, however, are on another subnet. There's no good way to currently add a default route for a local zone when the global zone is not also part of that network. I could either waste an IP in that subnet (which I don't want to do), or follow this suggestion and ghetto-hack around it:


[root@chironex]:[~]# cat /etc/hostname.nge0\:99
0.0.0.0
[root@chironex]:[~]# ifconfig nge0:99 plumb up
[root@chironex]:[~]# ifconfig -a
nge0:99: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 0.0.0.0 netmask ff000000
[root@chironex]:[~]# zonecfg -z ircd info
zonename: ircd
zonepath: /export/zones/ircd
autoboot: true
pool:
limitpriv:
fs:
dir: /opt
special: /opt
raw not specified
type: lofs
options: [ro,nodevices]
net:
address: A.B.C.D
physical: nge0
[root@chironex]:[~]# ifconfig nge0:99 A.B.C.D netmask A.B.C.248
[root@chironex]:[~]# route add default 1.2.3.4
add net default: gateway 1.2.3.4
[root@chironex]:[~]# ifconfig nge0:99 0.0.0.0 netmask 255.0.0.0
[root@chironex]:[~]# zoneadm -z ircd boot
[root@chironex]:[~]# ifconfig -a
nge0:5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
zone ircd
inet A.B.C.D netmask fffffff8 broadcast 1.2.3.5

Works just fine, though.

(If it came down to some network-contention problems, I could pull the same trick on bge0, another physical device in the system... but it won't.)

March 16, 2007
March 17, 2007

Picked up off osnews. Yeah, I should know better. I might catch something.

Solaris 10 11/06 (Update 3) Review

Solaris Express is coming along; and for those who do want bleeding edge, ultra-super-duper features, then Solaris probably isn't your best bet, then again, assuming you're into that stuff, you'd be better catered for by the likes of Gentoo for example - for those of us who would prefer to have stability above features, then give Solaris a go and if you can make a contribution to Solaris by way of code contributions, then by all means do so.

Recommending Gentoo over Solaris.

Gentoo.

To stay in context, though, he is talking about the desktop market. So why did he review Solaris 10? Why not Nexenta, which is geared to for exactly that?

And Gentoo. Instead of say... Ubuntu.

Also, Solaris lacking features that Linux has? That's a bloody joke and a half:

Whatever.

There was this noise a minute ago? Weird wooshing sound? Right over someone's head.

This has been your monthly blog rant against some other blog post some blogging guy somewhere wrote about something.

April 16, 2007

Live Upgrade for Idiots

Huzzah, I was going to start looking for something like that. :-)

Added that site to my feeds as well. woop.

[via stevel]

May 18, 2007

I've been spending a lot of time working on consolidating our services. It's tedious, because we have been a Linux shop since the company was started: there are many Linux and GNUisms. I have yet to question the decision to move as much as I can to Solaris 10, however. Consider:

We have, in the past, had two (more or less) dedicated systems for running MySQL backups for our two production databases. These replicas would stop their slave threads, dump, compress, and archive to a central system. Pretty common. But they were both taking up resources and rackspace that could otherwise be utilized.

Enter Solaris Zones. There's no magic code required for mysqldump and bzip2, so moving them was trivial. The most annoying part of building any new MySQL replica is waiting on the import. But, if you admin MySQL clusters you're probably already really, really used to that annoyance.

So I built a new database backup zone to run both instances of MySQL. Creatively, I named it dbbackup. It ran on zhost1 (hostnames changed to protect the innocent). Unfortunately, zhost1 also runs all our development zones (bug tracking, source control, pkgsrc and par builds) as well our @loghost. Needless to say, the addition of two MySQL dbs writing InnoDB pretty much killed I/O (this is on an X2100 M1 with mirrored 500GB Seagate SATA 3.0Gb/s drives), making the system annoying to use.

This week I deployed two new systems to act as zone hosts, one of which is slated for non-interactive use. So last night I brought down the database backup zone and migrated it over.

This document details the process, which is ridiculously trivial. No, really. The most annoying part was waiting on the data transfer (60GB of data is slow anywhere at 0300).

My one piece of extra advice is: Make sure both systems are running the same patch level before you start. PCA makes this pretty trivial to accomplish.

This is a sparse-root zone, but there are two complications:

  • I delegate a ZFS dataset to the zone, so there are a bunch of ZFS volumes hanging off it. However, they all exist under the same root as the zone itself, so it's not really a big deal.
  • I have a ZFS legacy volume set up for pkgsrc. By default pkgsrc lives in /usr/pkg, /usr is not writable since it's a sparse zone, and I don't really want to deal with moving it. It needs to be mounted at boot time (before the lofs site_perl mounts which contain all our Perl modules in the global zone), however, and after a little bit of poking I couldn't figure out how to manipulate zvol boot orders. Legacy volumes get precedence over lofs, though, so. Ghetto, I know.

The volume set up looks like this:

[root@dbbackup]:[~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 85.3G 348G 24.5K /data
data/zones 41.2G 348G 27.5K /export/zones
data/zones/dbbackup 40.4G 348G 135M /export/zones/dbbackup
data/zones/dbbackup/tank 39.9G 348G 25.5K none
data/zones/dbbackup/tank/mysql 39.9G 348G 8.49G /var/mysql
data/zones/dbbackup/tank/mysql/db2 24.5G 348G 24.5G /var/mysql/db2
data/zones/dbbackup/tank/mysql/db1 6.92G 348G 6.92G /var/mysql/db1

So, my process?

First, shut down and detach the zone in question.

[root@zhost1]:[~]# zlogin dbbackup shutdown -y -i0
[root@zhost1]:[~]# zoneadm -z dbbackup detach

Make a recursive snapshot of the halted zone. This will create a snapshot of each child hanging off the given root, with the vanity name you specify.

[root@zhost1]:[~]# zfs snapshot -r data/zones/dbbackup@migrate

Next, use zfs send to write each snapshot'd volumes to a file.

[root@zhost1]:[~]# zfs send data/zones/dbbackup@migrate > /export/scratch/dbbackup@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/pkgsrc@migrate > /export/scratch/dbbackup-pkgsrc\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank@migrate > /export/scratch/dbbackup-tank\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank/mysql@migrate > /export/scratch/dbbackup-tank-mysql\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank/mysql/db2@migrate > /export/scratch/dbbackup-tank-mysql-db2\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank/mysql/db1@migrate > /export/scratch/dbbackup-tank-mysql-db1\@migrate

Now, copy each of the dumped filesystem images to the new zone host (zhost2), using scp or whatever suits you. Stare at the ceiling for two hours. Or catch up on Veronica Mars and ReGenesis. Whichever.

Once that's finished, use zfs receive to import the images into an existing zpool on the new system.

[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup < dbbackup\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/pkgsrc < dbbackup-pkgsrc\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank < dbbackup-tank\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank/mysql < dbbackup-tank-mysql\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank/mysql/db2 < dbbackup-tank-mysql-db2\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank/mysql/db1 < dbbackup-tank-mysql-db1\@migrate

Before I could attach the zone, I needed to set the mountpoints for the dataset and legacy volumes properly.

[root@zhost2]:[~]# zfs set mountpoint=legacy data/zones/dbbackup/pkgsrc
[root@zhost2]:[~]# zfs set mountpoint=none data/zones/dbbackup/tank

Also, since the zone was living on a different network than the host system, I needed to add a default route for that network to the interface. I talked about this earlier, and it's a workaround that should be going away once NIC virtualization makes it into Solaris proper from OpenSolaris (I would guess u5?).

[root@zhost2]:[~]# ifconfig nge0:99 A.B.C.D netmask 255.255.255.224
[root@zhost2]:[~]# route add default A.B.C.1
add net default: gateway A.B.C.1
[root@zhost2]:[~]# ifconfig nge0:99 0.0.0.0 netmask 255.0.0.0

Now, create a stub entry for the zone with zonecfg.

[root@zhost2]:[~]# zonecfg -z dbbackup
dbbackup: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:dbbackup> create -a /export/zones/dbbackup
zonecfg:dbbackup> exit

And that's pretty much it.

Attach the zone and boot it and you're done.

[root@zhost2]:[~]# zoneadm -z dbbackup attach
[root@zhost2]:[~]# zoneadm -z dbbackup boot

Once it was up, I logged in, made sure the MySQL instances were replicating happily and closed the ticket for moving the zone.

This level of flexibility and ease of use is key. In addition to the other technologies included in Solaris 10, you'd be crazy not to be utilizing it. (Even with the annoying bits still lurking in Sol10, it's absolutely worth the effort.)

And it's only going to get better.

May 31, 2007

One of my major blocking tasks right now is to rebuild our Perl tree (about 600 modules) as PARs for easy distribution. I'm not using SRV4 or pkgsrc packages for them because I want to have one build system for both our Linux and Solaris systems. Much of our code relies on version-specific behaviors, so the only way for me to actually get stuff ported (without going totally batshit) from the entrenched Linux systems to the Solaris boxes is to rebuild all those modules, at those versions.

Yesterday, I spent most of the day compiling one Perl module (Math::Pari, which relies on the pari math libraries, and engages me in an indecent amount of skullfuckery whenever I try to build it. This particular adventure into stupidity was caused mainly by braindead pkgsrc dependencies... pari relies on teTeX -- to build its documentation -- which relied on X11. I spent a good portion of time trying to get it working the way it wanted until finally just ripping the X bits out. Guess how long that took. Yup. Bare minutes.). Finally I just built pari by hand to /opt and linked Math::Pari against that. Could have saved hours and hours...

Spent all of today building the rest of our Perl modules. Got down from ~600 to 67. Pulling the modules was easy enough was hdp mentioned the by-module listing on the CPAN. The vast majority of modules were well-behaved; it was simply a matter of iterating over the modules, running

perl Makefile.PL --skipdeps && \
make && make test && \
cd blib \
zip -r $DEST/$MODULE-i386-solaris-5.8.8.par *

and then installing it. Not a big deal. Some of them were tenacious and obnoxious, though, and ate up a lot of time. We can theoretically (this has not proven to be completely true) skip deps as any dependencies should exist in our local tree. I'm sure once I'm done I'll have to check to make sure all the modules actually have their deps, but the vast majority should.

Tomorrow I get to finish this up and then maybe get some working code running in some zones. Huz-freakin'-zah.

June 6, 2007

OpenSolaris: Five updates conservative developers should make

Some really good points listed therein.

I dream about the days when our code all runs happily on Solaris, is packaged up in SRV4 streams, and I can open "add DTrace providers to stuff" tickets...

June 8, 2007

< brendang> # dtrace -n 'zangband:::'
< brendang> dtrace: description 'syscall:::' matched 470 probes
< brendang> CPU ID FUNCTION:NAME
< brendang> 1 20164 pink_jelly:hits_you
< brendang> 1 20164 pink_jelly:hits_you
< brendang> 1 20164 pink_jelly:hits_you
< brendang> 1 20164 character:dies

July 19, 2007
August 24, 2007

< bda> Legacy.
< bda> This is an inherited mess.
< bda> (Which is not much of an excuse, but it's what I've got.)
< dwc-> bda: it's understandable
< dwc-> I've had plenty of uh, legacy ... cruft. to deal with for awhile before it can finally get tossed
< bda> dwc-: The vast majority of the other cruft has been replaced with shiny, maintainable things.
< bda> This site is hours away, I have no easy mode of transit, blah blah blah.
< bda> Soon I will build a pod of Sun boxes and hitchhike up there with a scary old man who insists I hold his dentures for him.
< Tempt> a pod of Sun boxes
< Tempt> Does the pod open and give birth to an army of drones?
< bda> No, an army of zones.
< Tempt> boom-tish

October 4, 2007

After messing around with plain old Jumpstart for a day, I got sick of it and decided to try out Jumpstart Enterprise Toolkit after eryc mentioned it, a bunch of code living on stop of Jumpstart meant to make lives easier. It does. Getting things set up, adding hosts, etc, goes from being kind of tedious to trivial. The real killer for me was dealing with Solaris's DHCP manager. Man, what a weird, annoying thing.

So now I have Jumpstart set up in Parallels on my laptop (that's 30GB I won't be getting back anytime soon), which is a pretty useful thing to have. I suppose next I'll set an FAI VM for those Debian boxes I still haven't replaced...

Here is the HOWTO I used as a starting point, and also the JET wiki.

Someone in #opensolaris yesterday mentioned they had a Debian Etch zone branded zone running. And it looks pretty trivial to do, too.

Derek Crudgington, of Joyent, has a post over on his blog about using DTrace to instrument MySQL (which does not have any DTrace probes). As long as you know the function names you're interested in, you can some really useful information out of it.

The fact that you can get that information, which would typically get you a major performance hit from MySQL itself, without MySQL having to be touched, restarted, or impaired, is just another example of how great DTrace is.

October 9, 2007

Several months ago, after watching Bryan Cantrill's DTrace talk at Google, I went looking for the then-current state of DTrace userstack helpers for Perl. We're a big Perl shop; being able to get ustacks out of Perl would be a pretty major thing for me. I came across a blog post by Alan Burlison who had patched Perl 5.8.8 with subroutine entry/return probes, but couldn't, at the time, find a patch for it. So I forgot about it.

The other day I re-watched that talk and went looking again. Discovering, in the process, that Richard Dawe had reproduced Alan's work and released a diff. Awesome!

So the basic process is this:

  • Get a clean copy of Perl 5.8.8
  • Get Richard's patch
  • Read the instructions in the patch file
    • note that you have to build with a dynamic libperl!
  • Use gpatch to patch the source, and configure Perl as usual
$ cd perl-5.8.8
$ gpatch -p1 -i ../perl-5.8.8-dtrace-20070720.patch
$ sh Configure

Noted by Brendan Gregg, you'll also need to add a perldtrace.o target to two lines in the Makefile (line numbers may differ):

274          -@rm -f miniperl.xok
275          $(LDLIBPTH) $(CC) $(CLDFLAGS) -o miniperl \
276              miniperlmain$(OBJ_EXT) opmini$(OBJ_EXT) $(LLIBPERL) $(libs) perldtrace.o
277          $(LDLIBPTH) ./miniperl -w -Ilib -MExporter -e '' || $(MAKE) minitest
278
279  perl$(EXE_EXT): $& perlmain$(OBJ_EXT) $(LIBPERL) $(DYNALOADER) $(static_ext) ext.libs $(PERLEXPORT)
280          -@rm -f miniperl.xok
281          $(SHRPENV) $(LDLIBPTH) $(CC) -o perl$(PERL_SUFFIX) $(PERL_PROFILE_LDFLAGS) $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(DYNALOADER) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs) perldtrace.o

As the patch instructions state, you'll need to generate a DTrace header file, running:

$ make perldtrace.h
/usr/sbin/dtrace -h -s perldtrace.d -o perldtrace.h
dtrace: illegal option -- h
Usage: dtrace [-32|-64] [-aACeFGHlqSvVwZ] [-b bufsz] [-c cmd] [-D name[=def]]

Ouch, ok, apparently dtrace -h is broken on Solaris 10u3. I mentioned this on #dtrace, and Brendan suggested I find a Perl script posted to dtrace-discuss by Adam Leventhal to emulate dtrace -h behavior.

But I'm lazy and have Solaris 10u4 boxes, so I just generate the header file on one of those and copy it over to the u3 box.

Once you have perldtrace.h in place, run make as normal, get a cuppa, whatever.

As soon as your make is done running, check the patch file for instructions on running a simple test to see if it's working. I have yet to have any issues.

Now, as Alan mentions in his blog, there's a chance you could eat a 5% performance hit. For me, that would be worth it, due to the complexity of our codebase and the fact I am sometimes (though thankfully not recently) called upon to debug something I am wholly unfamiliar with at ungodly hours of the night. Digging around for the problem is hard as adding debugging to running production code is simply not going to happen. With a DTrace-aware Perl, it's simply a matter of crafting proper questions to ask and writing wrappers to make the inquiries.

I'm certainly not at a point where I can do that, but I reckon it won't be long after I've deployed our rebuilt Perl packages that I'll be learning "A is for Apple ... D is for DTrace".

To simply quantify that performance hit, rjbs suggested we run the Perl test suite on various builds. Below I have (again, very simple) metrics on how long each build took to run the tests. As DTrace requires a dynamic libperl, which is going to be a performance hit of some (unknown to me) value, I have both static and dynamic vanilla (no DTrace patch) build times listed.

Build type real/user/sys
Vanilla Perl, static libperl 8m44.880s/3m44.770s/1m41.745s
8m48.657s/3m48.574s/1m41.623s
8m46.513s/3m46.272s/1m41.728s
Vanilla Perl, dynamic libperl 9m41.212s/4m32.217s/1m49.256s
9m57.276s/4m47.755s/1m49.443s
9m43.576s/4m34.341s/1m49.520s
Patched Perl, dynamic libperl, not instrumented 10m17.740s/4m32.825s/1m49.017s
10m16.507s/4m32.982s/1m49.350s
10m22.689s/4m38.937s/1m49.287s

If the tests suite is indeed a useful metric, the hit is certainly not nothin'. I suspect there would be ways to mitigate that hit, though.

As soon as I gain some clue (or beg someone in #dtrace for the answer), I'll run the same tests while instrumenting the Perl processes. Just need to figure out how to do something like

syscall:::entry
/execname == "perl"/
{
  self->follow = 1;
}

perl$1:::sub-entry, perl$1:::sub-return
/self->follow/
{ ... }


when the Perl processes I want to trace are completely ephemeral.

October 10, 2007

Noticing the question in my previous post about ephemeral processes, seanmcg in #dtrace suggested I write something akin to this, which did occur to me, vaguely, as a possibility. But it seemed like far more complexity than I wanted to create, and starting/stopping processes to kick off watchers sounded like a good way to impact performance in an already loaded environment (read: our mailservers). I knew there had to be a better way to do it than wrapping DTrace up in Perl so I could monitor Perl, but I couldn't figure out how to do it with the pid::: provider. Well, you can't. But!

< brendang> the wildcard "*" doesn't work properly for the pid provider, but does work for the USDT language providers
< brendang> most of the language examples in the new DTraceToolkit use perl*:::, mysql*:::, javascript*:::, etc

Obviously DTT should have been the first place I looked, instead of whining. :-)

So if you are trying to follow something specific with the pid:::, seanmcg's method is certainly viable. I just wanted to glob onto all Perl processes, though.

Brendan also offered the following (as I was thinking about it backwards in my previous post):

#!/usr/sbin/dtrace -Zs

perl*:::sub-entry
{
self->sub = copyinstr(arg0);
}

syscall:::entry
/self->sub != NULL/
{
printf("Perl %s() called syscall %s()", self->sub, probefunc);
}

perl*:::sub-return {
self->sub = 0;
}

Start 'er up in Terminal A:


[20071010-00:10:31]:[root@mako]:[~]# ./perlsubs.d
dtrace: script './perlsubs.d' matched 232 probes

Kick off one our simple but venerable helper scripts, with shebang set to the patched Perl:


[20071010-00:10:34]:[root@mako]:[~]# ./spool-sizes.pl -h

usage: spool-sizes.pl [-tabcdimsvh]
-t: global threshold (default = 1000 messages)
-a: active spool threshold (default = $threshold)
-H: hold spool threshold (default = $threshold)
-c: corrupt spool threshold (default = $threshold)
-d: deferred spool threshold (default = $threshold)
-i: incoming spool threshold (default = $threshold)
-n: no mail (do not mail, but create file in /var/tmp/spool-sizes)
-T: add a composite "total" spool
-v: visual (i.e. output to console vs. file and do not mail)
-h: help (this message)

And, back in Terminal A:


CPU ID FUNCTION:NAME
0 40463 stat64:entry Perl BEGIN() called syscall stat64()
0 40463 stat64:entry Perl BEGIN() called syscall stat64()
0 40463 stat64:entry Perl BEGIN() called syscall stat64()
...
0 40097 close:entry Perl BEGIN() called syscall close()
0 40325 systeminfo:entry Perl hostname() called syscall systeminfo()
0 40185 ioctl:entry Perl usage() called syscall ioctl()
0 40467 fstat64:entry Perl usage() called syscall fstat64()
0 40093 write:entry Perl usage() called syscall write()
0 40093 write:entry Perl usage() called syscall write()
0 40093 write:entry Perl usage() called syscall write()
...


And here's the output of Alan B's example script:


[20071010-00:10:41]:[root@mako]:[~]# ./perlsubs2.d
dtrace: script './perlsubs2.d' matched 7 probes
^C
CPU ID FUNCTION:NAME
0 2 :END 2 import /opt/perl/perl5.8.8/lib/5.8.8/warnings.pm
3 import /opt/perl/perl5.8.8/lib/5.8.8/strict.pm
6 BEGIN /opt/perl/perl5.8.8/lib/5.8.8/vars.pm
6 bits /opt/perl/perl5.8.8/lib/5.8.8/strict.pm
11 import /opt/perl/perl5.8.8/lib/5.8.8/AutoLoader.pm
25 import /opt/perl/perl5.8.8/lib/5.8.8/Exporter.pm
26 BEGIN /opt/perl/perl5.8.8/lib/5.8.8/i86pc-solaris/Sys/Hostname.pm
32 load /opt/perl/perl5.8.8/lib/5.8.8/i86pc-solaris/XSLoader.pm
62 AUTOLOAD /opt/perl/perl5.8.8/lib/5.8.8/i86pc-solaris/POSIX.pm
68 BEGIN /opt/perl/perl5.8.8/lib/5.8.8/warnings.pm
85 BEGIN ./spool-sizes.pl
271 PERL PERL

This won't be useful at all. Tomorrow I'm going to try and get back to porting our MX dispatching software to Solaris. hdp says all the tests pass, so it should just be a matter of making sure each of the associated daemons work properly, have manifests, etc.

And then, the fun part: Writing a little something I've been referring to as mailflow.d...

October 13, 2007
October 15, 2007

Based on Albert Lee's howto:


[20071015-08:38:52]:[root@clamour]:[~]# uname -a
SunOS clamour 5.10 Generic_120012-14 i86pc i386 i86pc
[20071015-08:38:53]:[root@clamour]:[~]# zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
3 control running /export/zones/control native shared
4 lunix running /export/zones/lunix lx shared
[20071015-08:38:56]:[root@clamour]:[~]# zlogin lunix
[Connected to zone 'lunix' pts/5]
Last login: Mon Oct 15 12:37:28 2007 from zone:global on pts/4
Linux lunix 2.4.21 BrandZ fake linux i686

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
lunix:~#

After I stop laughing hysterically, visions of collapsing Linux boxes into Solaris zones dancing through my twitching little mind, I'll have to see how twitchy the install itself is. Already it appears that some stuff is unhappy, though most of it seems to revolve around things that don't matter (ICMP oddities, console oddities wrt determing how smart it is for restarting services -sigh- and a few other easily surmountable or ignorable things).

Overall: Hello, awesome.

(Update: It appears that 6591535 makes this a non-starter. I am now, again, a very sad bda with a bunch of crappy hardware and nowhere to move their services to.)

November 1, 2007

So the "OpenSolaris Developer Preview" was released last night. I spent a few minutes with it, and it has generated a frankly ridiculous amount of controversy inside the community (to the point where my inbox has tripled in size). So what's the deal?

Well, it's actually a pretty decent first release. It has ZFS on root (awesome!), the Image Packaging System (which is way cool), and is almost as trivial to install as Ubuntu. People are whining about a bunch of nonsense (wah, the default shell, wah, no KDE in the first release, wah), but by far the biggest complaints center around Indiana taking the OpenSolaris name. This gets a big fat whatever from me.

I'm wondering if anyone complaining has actually read the FAQ.

Dennis Clarke has some screenshots over at Blastwave.

If you are thinking of giving it a try, you should probably read through the immigrants page. benr++

I installed it without issues in a Parallels VM. It panicked on boot a couple times, though I suspect that it more to do with Parallels than Solaris. I need to make an image of the compiler tools so I can get the NIC supported, but that is a pretty trivial thing. I suspect I will not bother and just build at workstation at the office and throw Indiana on there.

Overall I think this is a fairly exciting milestone for OpenSolaris. Their release schedule of every six months is encouraging, as it works very well for certain other high-quality projects. The barrier for adoption has fallen and now that code has been thrown over the wall, perhaps people can start contributing instead of ... not.

Well, once they get over accusing the Indiana guys of "stabbing the community in the back" and eating babies...

November 16, 2007

I have an OpenSolaris box in pilot at the moment, running build 74. It uses Lori Alt's patched miniroot so I can set up a rootpool and do a profile (network, via Jumpstart) install. It works really well.

Yesterday the box went into a reboot loop, and as there appears there are issues with b74, I figured I would finally get around to learning how to use BFU (which is a change-aware wrapper around cpio that writes to /; it's not something you can back out of). But before I did that, I would need to figure how to boot from a ZFS clone. If the BFU goes south, or if the new build bricks the box, I need a way to boot to back into the old system. It's the poor man's LiveUpgrade, I suppose, but it's still way cool and (I think) much easier.

So that's the goal here: Take a snapshot of the current system, clone the snapshot so it's writable, and then upgrade the clone. This way we can BFU the system and still have a fallback in the event that the BFU fails, or the new OS/Net build bricks our box.

Tim Foster had already written a blog post about how easy this was, so I wasn't expected to run into any problems.

First, grab the ON build tools and the BFU archives for the build you care about.


[root@octopus]:[~] cd /tmp
[root@octopus]:[/tmp]# wget http://dlc.sun.com/osol/on/downloads/b75/SUNWonbld.i386.tar.bz2
[root@octopus]:[/tmp]# wget http://dlc.sun.com/osol/on/downloads/b75/on-bfu-nightly-osol-nd.i386.tar.bz2

You probably want to do that in tmp (which is swap) so when you take your snapshots, big random files are not littering the filesystem forever.

Set up your build environment:

[root@octopus]:[/tmp]# bunzip2 on-bfu-nightly-osol-nd.i386.tar.bz2
[root@octopus]:[/tmp]# tar -xf on-bfu-nightly-osol-nd.i386.tar
[root@octopus]:[/tmp]# bunzip2 SUNWonbld.i386.tar.bz2
[root@octopus]:[/tmp]# tar -xf SUNWonbld.i386.tar 
[root@octopus]:[/tmp]# cd onbld/
[root@octopus]:[/tmp/onbld]# pkgadd -d . SUNWonbld 
[root@octopus]:[/tmp/onbld]# cd
[root@octopus]:[~]# export FASTFS="/opt/onbld/bin/i386/fastfs"
[root@octopus]:[~]# export GZIPBIN="/usr/bin/gzip"
[root@octopus]:[~]# export BFULD="/opt/onbld/bin/`uname -p`/bfuld"
[root@octopus]:[~]# export PATH="/opt/onbld/bin:/opt/onbld/bin/`uname -p`:$PATH"

Now we need to take a snapshot of our current rootfs, clone it is writable, and mount it. In my setup, the rootpool is a legacy mount, and anything under it is also going to inherit the legacy mount property.


[root@octopus]:[~]# zfs snapshot rootpool/b74@upgrade
[root@octopus]:[~]# zfs clone rootpool/b74@upgrade rootpool/b75
[root@octopus]:[~]# zfs set mountpoint=/rootpool/b75 rootpool/b75

Now it's time to do the actual upgrade. I ran into two very minor snags here. First, I don't have BIND installed, so I needed to pass -f to bfu. Secondly, I don't have D-BUS installed, and had to comment that check out of the bfu script. Once that's done, it goes off and does it's thing happily.

Once the BFU finished you'll be put into a safe environment with tools built to work regardless of how horribly the BFU may have messed up your system (not an issue here, as we aren't actually modifying our current rootfs). As soon as it's done, you'll need to resolve the conflicts it lists; thus far I have not had an issue with using Automated Conflict Resolution to merge those files.


[root@octopus]:[~]# bfu -f /tmp/archives-nightly-osol-nd/i386 /rootpool/b75
bfu# /opt/onbld/bin/acr /rootpool/b75

And that's it. Your clone has now been upgraded using BFU. Create a boot archive of the new BE and set it legacy again.


[root@octopus]:[~]# bootadm archive-update -R /rootpool/b75
[root@octopus]:[~]# zfs set mountpoint=legacy rootpool/b75

You have a couple options for managing your boot environments at this point. You can either modify /rootpool/boot/grub/menu.lst yourself, or use Tim Foster's zfs-bootadm.sh to do it for you. The script relies on a property to determine which zfs fs are bootable, so you'll need to set that.


[root@octopus]:[~]# ./zfs-bootadm.sh
Usage: zfs-bootadm.sh [command]

where command is one of:
create
Creates a new bootable dataset as a clone
of the existing one.
activate
Sets a bootable dataset as the next
dataset to be booted from.
destroy
Destroys a bootable dataset. This must not
be the active dataset.
list
Lists the known bootable datasets.

[root@octopus]:[~]# zfs set bootable:=true rootpool/b75
[root@octopus]:[~]# ./zfs-bootadm.sh list
b74 (current)
b75
test
[root@octopus]:[~]# ./zfs-bootadm.sh activate b75
Currently booted from bootable dataset rootpool/b74
On next reboot, bootable dataset rootpool/b75 will be activated.
[root@octopus]:[~]# reboot

The box reboots, and...


[bda@moneta]:[~]$ ssh root@octopus
Last login: Fri Nov 16 02:50:21 2007 from 10.10.1.20
Sun Microsystems Inc. SunOS 5.11 snv_75 October 2007
bfu'ed from /tmp/archives-nightly-osol-nd/i386 on 2007-11-16
Sun Microsystems Inc. SunOS 5.11 snv_74 October 2007
[root@octopus]:[~]# uname -a
SunOS octopus 5.11 snv_75 i86pc i386 i86pc

Pretty dang cool stuff!

My initial test here was to BFU from b74 to b76. After some fumbling about with where the menu.lst file was (I knew it was stored on the rootpool from reading Lori Alt's weblog and various presentations, but rootpool was a legacy mount, so stupid tired me was confused for a good ten minutes). The BFU and acr itself appeared to be fine, and when I finally got the BE to boot, it... panicked.

I was somewhat discouraged, but booted right back into b74 and BFU'd happily to b75.

Which was the entire point of the exercise: To upgrade the system and have a safe way to fall back to a previous build if the system becomes unusable. As I said, it's the poor man's LiveUpgrade, but LU doesn't currently support zfsboot. And, really, this just seems much quicker and easier to deal with.

There are plenty of little things to figure out still (like which filesystems are required to be on the BE for the BFU to work, so I don't end up with data being snapshotted forever), how to deal with package upgrades, and the the like. But overall... very, very cool.

Another thing to note is that everything above was gleaned not just from documentation but from the blogs of the developers.

November 22, 2007
December 15, 2007

I have been a big fan of Patch Check Advanced, as it makes patching Solaris systems not an incredible pain in the ass.

Noted on the news section there is pcapatch, which evidently aims to safely automate pca patch installation.

I suspect I might be a big fan of that as well.

December 16, 2007

To be clear, my understanding of everything I'm about to say is very basic. It's all built on implementing work others did a few months ago, and reading up last night and this morning. If I say something ridiculous, I call nubs.

(As an aside, it appears that TCL supports USDT probes; news to me!)

Bryan Cantrill mailed me the other day after finding my previous post regarding DTrace and Perl via a post by Sven Dowideit. Bryan noted that Alan's patch pre-dated Adam Levanthal's work on is-enabled probes, which are highly useful for dynamic languages: Code is only executed when DTrace is actively tracing a given probe.

When it isn't, there should be no perf hit; the caveat seems to be that when tracing is enabled when using is-enabled probes, the hit is going to be higher than the previous standard static probes.

In the current state of DTrace in Perl (as far as I am aware), there are only two probes: sub-entry and sub-return. Compare to Joyent's work on Ruby, which has about a dozen (the diff for Ruby is over 20,000 lines, though, so obviously there's a lot more going on than just throwing some USDT probes in). When you are only interested in having what objects are being destroyed, for instance, you don't want to have the function probe toggled.

So this morning after reading a very helpful USDT example, I went ahead and modified Alan Burlinson's patch for is-enabled probes.


[20071216-10:56:50]:[bda@drove]:[~/dtrace/perl]$ diff -u perl-5.8.8-dt-alanb/cop.h perl-5.8.8-isenabled/cop.h
--- perl-5.8.8-dt-alanb/cop.h Sat Dec 15 17:15:14 2007
+++ perl-5.8.8-isenabled/cop.h Sun Dec 16 10:56:49 2007
@@ -126,6 +126,7 @@
* decremented by LEAVESUB, the other by LEAVE. */

#define PUSHSUB_BASE(cx) \
+ if (PERL_SUB_ENTRY_ENABLED()) \
PERL_SUB_ENTRY(GvENAME(CvGV(cv)), \
CopFILE((COP*)CvSTART(cv)), \
CopLINE((COP*)CvSTART(cv))); \
@@ -180,6 +181,7 @@

#define POPSUB(cx,sv) \
STMT_START { \
+ if (PERL_SUB_RETURN_ENABLED()) \
PERL_SUB_RETURN(GvENAME(CvGV((CV*)cx->blk_sub.cv)), \
CopFILE((COP*)CvSTART((CV*)cx->blk_sub.cv)), \
CopLINE((COP*)CvSTART((CV*)cx->blk_sub.cv))); \

Yeah, that was really it. I know, right?

So, now, what do my numbers look like for running the Perl test suite?

Note that all I'm doing is firing on sub-entry and sub-return with no other processing, in destructive mode (otherwise DTrace bottoms out due to systemic unresponsiveness).

static libperl, unpatched:

real 5m42.162s
user 2m28.597s
sys 0m30.161s

dynamic libperl, unpatched:

real 6m31.771s
user 3m16.823s
sys 0m31.698s

dynamic libperl, patched, standard probes, not instrumented:

real 6m33.610s
user 3m12.911s
sys 0m33.445s

dynamic libperl, patched, standard probes, instrumented:

real 9m1.302s
user 3m15.186s
sys 2m47.087s

dynamic libperl, patched, is-enabled probes, not instrumented:

real 6m44.823s
user 3m18.589s
sys 0m43.765s

dynamic libperl, patched, is-enabled probes, instrumented:

real 9m27.597s
user 3m16.791s
sys 3m6.972s

Not that big of a difference, really.

What's really interesting (to me, anyway) about the above are how dynamic libperl and both sets of patches take basically the same amount of time to complete. Compared to my previous "tests" took an extra ~40s as opposed to 10s. Here I am using Sun Studio 12; previously I had been using gcc. I imagine that might make a difference.

I suspect, though, that a number of further factors are at play: the fact that the Perl test suite's behavior is (hopefully?) nothing remotely akin to what you'd see in production, the fact that we're only instrumenting a single set of probes as opposed to having entry points in other places for comparison... Most importantly, though, I imagine whatever changes were made to Ruby might have analogies here as well.

Still, I'm interested enough now to start digging through Joyent's Ruby diff and investing Perl's internals to determine other probe points.

Maybe in a week or so I'll have something worth showing off to #p5p as Rik suggests.

Or my C ignorance will bite me horribly and I'll be forced to commit seppubukkake to save.. face?

January 3, 2008

Ben Rockwood expounds upon the joys of IPMI.

As someone who was only using it to reboot his systems (and configure the SP when I'd forgotten to do so during build), it's a pretty enlightening article.

January 20, 2008

Transactional Debian Upgrades with ZFS on Nexenta

Bloody amazing is what that is. Not because the concept is revolutionary (it's been possible with hacked ONNV installs for a while now, Indiana is doing something similar, and a few "other" operating systems have had similar capabilities), but because it's integrated and the interface itself is so obvious. It looks as easy to use as apt(8) and zfs(1M).

Very exciting stuff.

April 26, 2008

Ben Rockwood digs into some odd disk activity using your friendly neighborhood Solaris tools.

June 2, 2008

The other day I ran into an issue where bootstrapping pkgsrc 2008q1 would hang while running bmake regression tests.

The fix is here.

June 14, 2008

< tiziano84> Hi
< tiziano84> How can I cane make an "online update" of openSolaris ?
< tsang> put your computer on a line and proceed with the update
< CosmicDJ> download a newer release and liveupgrade it :)
< e^ipi> let's go back to first principles here, shall we
< e^ipi> which distro are you using?
< e^ipi> the one with the bubbles, or the other one?

June 23, 2008

[20080622-13:36:33]:[bda@mako]:[~]$ pfexec pkg refresh
[20080622-13:43:58]:[bda@mako]:[~]$ pfexec pkg install pkg:/SUNWipkg@0.5.11,5.11-0.91
DOWNLOAD PKGS FILES XFER (MB)
Completed 1/1 93/93 0.84/0.84

PHASE ACTIONS
Removal Phase 2/2
Update Phase 87/87
Install Phase 9/9
[20080622-13:48:44]:[bda@mako]:[~]$ pfexec pkg image-update
DOWNLOAD PKGS FILES XFER (MB)
Completed 547/547 5585/5585 504.11/504.11

PHASE ACTIONS
Removal Phase 3098/3098
Update Phase 7617/7617
Install Phase 3367/3367
A clone of opensolaris-1 exists and has been updated and activated. On next boot the Boot Environment opensolaris-2 will be mounted on '/'. Reboot when ready to switch to this updated BE.
[20080622-13:52:38]:[bda@mako]:[~]$ beadm list

BE Active Active on Mountpoint Space
Name reboot Used
---- ------ --------- ---------- -----
opensolaris-2 no yes - 5.25G
opensolaris-1 yes no legacy 89.5K
opensolaris no no - 59.10M
[20080622-13:52:41]:[bda@mako]:[~]$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 7.18G 63.7G 61K /rpool
rpool@install 19.5K - 55K -
rpool/ROOT 5.31G 63.7G 18K /rpool/ROOT
rpool/ROOT@install 15K - 18K -
rpool/ROOT/opensolaris 59.1M 63.7G 2.41G legacy
rpool/ROOT/opensolaris-1 89.5K 63.7G 2.57G legacy
rpool/ROOT/opensolaris-1/opt 0 63.7G 595M /opt
rpool/ROOT/opensolaris-2 5.25G 63.7G 2.78G legacy
rpool/ROOT/opensolaris-2@install 5.83M - 2.22G -
rpool/ROOT/opensolaris-2@static:-:2008-06-09-19:03:02 110M - 2.41G -
rpool/ROOT/opensolaris-2@static:-:2008-06-22-17:17:20 532M - 2.57G -
rpool/ROOT/opensolaris-2/opt 595M 63.7G 595M /opt
rpool/ROOT/opensolaris-2/opt@install 72K - 3.60M -
rpool/ROOT/opensolaris-2/opt@static:-:2008-06-09-19:03:02 0 - 595M -
rpool/ROOT/opensolaris-2/opt@static:-:2008-06-22-17:17:20 0 - 595M -
rpool/ROOT/opensolaris/opt 33K 63.7G 595M /opt
rpool/data 18K 63.7G 18K /rpool/data
rpool/export 1.87G 63.7G 19K /export
rpool/export@install 15K - 19K -
rpool/export/home 1.87G 63.7G 1.87G /export/home
rpool/export/home@install 19K - 21K -
[20080622-13:52:51]:[bda@mako]:[~]$ init 6

Well... that's easy.

August 7, 2008

Building A Solaris Cluster Express Cluster in VirtualBox

Pretty interesting stuff. VBox on OS X is not incredibly useful to me (the lack of host networking is a killer), but I run OpenSolaris on my desktop at work.

Very cool stuff.

So for a while now I've been struggling with an older Xeon system which becomes more and more unresponsive until it finally hangs, when under a moderate amount of I/O load.

I asked zfs-discuss@ about it, and received a very helpful response from Marc Bevand.

Now the kernel heap bounces between 1.2GB (idle) and 1.4GB (loaded). The ARC has maxed around 400MB, but I haven't been doing any major reads off the box yet, just a lot of write I/O, so I don't think that's particularly surprising.

Yay.

(This experience really reminds me that I need to re-read Solaris Internals. I could have solved this problem myself, if I refreshed on those books periodically.)

August 13, 2008

So this morning has been... annoying.

A box was rebooted and didn't come back up. Network came up (pingable) but not ssh. Based on previous idiocy with this system, I suspected it had something to do with filesystems not being able to mount at boot. I shot off a mail to the NOC monkeys, not expecting much (and four hours later, still no response from them), and then started trying to get into the system myself to fix it.

The box in question is a Sun X4150; a really nice system (though now that I've had a T5120 for a while, I have to say I really do much prefer SPARCs simply for ease of administration), with a really lame-ass LOM (ELOM). But: Whatever. So I go to start the console via the LOM... no joy. Apparently console is not redirecting. So, ok, I should be able to get at glass (thanks for the reminder, dlg) via the web interface.

Of course there's no VPN at that site. So I kick open netcat and don't have much in the way of luck. After a few minutes of screwing around with it, I give up and download haproxy. In about three minutes I have it compiled, configured, and forwarding :80 and :443 for me.

listen proxy1 0.0.0.0:80
mode http
balance roundrobin
server test 192.168.11.10:80
contimeout 3000
clitimeout 150000
srvtimeout 150000
maxconn 60000
redispatch
retries 3
grace 3000
option forwardfor
option httplog
option dontlognull

listen ssl-relay 0.0.0.0:443
option ssl-hello-chk
balance source
server inst1 192.168.11.10:443

I log into the LOM, start the redirection Java app, and... nothing.

And... Mac OS X Java bullshit.

So I start an old Parallels OpenSolaris image I had laying around, connect to the LOM that way and... get an I/O connection error. Figuring that the KVM was running on another port, I sniffed off my firewall and discovered that yes, it wanted :8890 as well.

listen ssl-relay 0.0.0.0:8890
option ssl-hello-chk
balance source
server inst1 192.168.11.10:8890

Did that, got into the box and discovered the problem was...

[20080813-05:43:12]:[root@brood]:[~]# tail -2 /etc/vfstab
/dev/zvol/dsk/data/zones/lb-arc/root /dev/zvol/rdsk/data/zones/lb-arc/root /zones/lb-arc ufs 1 yes logging
/dev/zvol/dsk/data/zones/lb-arc/root /dev/zvol/rdsk/data/zones/lb-arc/root /zones/lb-arc ufs 1 yes logging

ugh.

A svcadm clear filesystem/local later, and all was well.

sigh.

August 27, 2008

Recently I moved our x86-64 pkgsrc build zone to another system. When I did so, I had forgotten I had built the original zone as full, to get around an annoying install(1M) bug. Basically, when you tried to build a package, it would attempt to recursively mkdir /usr/pkg. On sparse zones, /usr is shared read-only from the global zone.

So the install would fail, because it couldn't create /usr for obvious reasons. At the time, I thought I had tried various install programs, but given that the problem was being re-addressed and I didn't feel like reprovisioning a zone, I figured I would tackle it again.

After some minor discussion on #pkgsrc and grepping through mk/ I "discovered" the following variable:

TOOLS_PLATFORM.install?= /usr/pkg/bin/ginstall

Added to mk.conf and all is good. Mainly because ginstall actually uses mkdir -p, so...

The contents of pkgsrc/mk/platform/ are very useful if you aren't on NetBSD.

November 1, 2008

Solaris 10 10/08 (Update 6) was released yesterday. Release notes here.

I grabbed SPARC media and headed down to the colo yesterday to reinstall our T5120 (previously running b93). Fed the media in, consoled in via the SP, booted the system, and then left.

From much more comfortable environs, I got the system installed (service processors really are the best thing ever) without issue, and then, thanks to hilarity with my laptop, lost the randomized password I'd set for root. So whatever, I boot single-user and ... get asked for root's password. This is very similar to most Linux single-user boots these days, and more recently OpenSolaris.

I really, really didn't expect Solaris to follow suit. At least not for .. a while.

Very annoying. At dlg's suggestion, I tried booting -m milestone=none, but still had no joy. Ended up just booting cdrom -s and munging /etc/shadow that way.

Very annoying.

Anyway, having ZFS root in Solaris GA is pretty great. There are a number of really awesome features putback from Nevada this release, along with zfsboot. Check out the release notes. Good stuff.

UPDATE

Ceri Davies corrects me:

Just a note, because it sounds as if you think otherwise, that this behaviour has been present since at least update 3; ie. at least two years. You can turn it off by creating /etc/default/sulogin with the line PASSREQ=NO.

I don't recall seeing this behavior with u4 or u5, so evidently I am a crazy person. Thanks to Ceri for the info.

See sulogin(1M) for further details.

November 12, 2008

Finally got around to doing a Jumpstart for 10/08 today. After one little hitch (u6 renames the cNdN devices in my X2100s to the more proper cNtNdN), it all worked as expected.

fdisk c1t0d0 solaris delete
fdisk c1t1d0 solaris delete

fdisk c1t0d0 solaris all
fdisk c1t1d0 solaris all

install_type initial_install
pool rpool auto auto auto mirror c1t1d0s0 c1t1d0s0
bootenv installbe bename sol10u6

Yay, ZFS root!

December 12, 2008
December 27, 2008
March 1, 2009

Over the last two weeks we (read: rjbs) migrated our Subversion repositories to git on GitHub. I was not very pleased with this for the first week or so. By default, I am grumpy when things that (to me) are working just fine are changed, especially at an (even minor) inconvenience to me. That is just the grumpy, beardy sysadmin in me.

After a bit more talking to by rjbs, things are again smooth sailing. I can do the small amount of VCS work I need to do, and more imporantly: I am assured things I don't care about will make the developers lives much, much less painful, which is something I am certainly all for.

git is much faster than Subversion ever was, and I can see some features as being useful to me eventually. Overall, though, what I use VCS for is pretty uninteresting, so I don't have much else to say about it.

I had a couple basic mental blocks that rjbs was able to explain away in a 20 minute talk he gave during our bi-weekly iteration meeting. It was quite productive. There are pictures.

Work has otherwise consisted of a lot of consolidation. I have finally reduced the number of horrible systems to two. Yes. Two. Both of which are slated for destruction in the next iteration. Not only that, I have found some poor sucker (hi, Cronin!) to take them all off our hands. Of course, they'll be upgrading from PIIIs, so...

I also cleaned up our racks. A lot. They are almost clean enough to post pictures of, though I'll wait until I've used up more of the six rolls of velcro Matt ordered before doing that.

Pretty soon we'll have nothing but Sun, a bit of IBM, and a very small number of SuperMicros. My plans are to move our mail storage from the existing SCSI arrays to a Sun J4200 (hopefully arriving this coming week). 6TB raw disk, and it eats 3.5" SATA disks, which are ridiculously cheap these days. I really, really wanted an Amber Roads (aka OpenStorage) J7110, but at 2TB with the cost of 2.5" SAS, it was impossible to justify. If they sold a SATA version at the low-end... there has been some noise about conversion kits for Thumpers, but that's also way outside our price range.

I doubt conversion support will become more common, but if I could turn one of our X4100s and the J4200 into an OpenStorage setup, I would incredibly happy. If you haven't tried out the OpenStorage Simulator, I suggest you do so. Analytics is absolutely amazing.

People on zfs-discuss@ and #opensolaris have been talking about possible GSoC projects. I suggested a zpool/filesystem "interactive" attribute, or "ask before destroy." However you want to think of it. Someone else expanded on that, suggesting that -t be allowed to ensure that only specified resource types can be destroyed. I have yet to bone myself with a `zfs destroy` or `zpool destroy` but the day will come, and I will cry.

I see a pkgsrc upgrade in my near future. I've been working on linking all our Perl modules against it, and I want to get the rest of our internal code linking against it as well. It will make OS upgrades so, so much easier. Right now, most code is either linked to OS libraries or to an internal tree (most of which also links to OS libraries).

We've almost gotten rid of all our Debian 3.1 installs, which is... well. You know. Debian 5.0 just came out, and we've barely gotten moved to 4.0 yet. Getting the upgrade path there sorted out will thankfully just be tedious, and require nothing clever.

I really hope that the Cobbler guys get Debian partitioning down soon, and integrate some Solaris support. I tried redeploying FAI over Christmas and man, did it so not work out of the box. I used to use FAI, and was quite happy with it. I had to hack it up, but... it worked pretty well. Until it stopped.

If Cobbler had Solaris support, I would seriously consider moving our remaining Linux installs to CentOS. We use puppet already, so in many ways Cobbler is a no-brainer. We are not really tied to any particular Linux distribution, and having all our infrastructure under a single management tools ken would be really nice. To put it mildly.

30% curious about OpenSolaris's Automated Installer project, but it's so far off the radar as to be a ghost.

I picked up John Allspaw's The Art of Capacity Planning, and it's next on my book queue. Flipping through it makes me think it's going to be as useful as Theo S.'s Scalable Internet Architectures.

April 1, 2009

So I have a device failing in one of my zpools:

extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 fd0
0.0 2.0 0.0 8.0 0.0 0.0 0.0 0.1 0 0 1 0 0 1 c0t0d0
0.0 2.0 0.0 8.0 0.0 0.0 0.0 0.1 0 0 1 0 0 1 c0t1d0
0.0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0 100 1 3 4 8 c0t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c0t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t5d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 6 2 0 8 c4t0d0
extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 fd0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c0t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c0t1d0
0.0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0 100 1 3 4 8 c0t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c0t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t3d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 c1t5d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c2t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c3t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 6 2 0 8 c4t0d0

etc...

It's part of a mirror:

pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t2d0 ONLINE 0 6 2
c0t3d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0

errors: No known data errors

So I reckon I'll just offline it and go replace it.

[20090401-17:20:12]::[root@shoal]:[~]$ zpool offline tank c0t2d0
cannot offline c0t2d0: no valid replicas
[20090401-17:31:15]::[root@shoal]:[~]$

err... what?

So I detach it from the mirror instead, which does work.

I ask jmcp if he has any insight into why this might be, and after a few minutes he asks if disconnecting the device works.

[20090401-18:01:57]::[root@shoal]:[~]$ cfgadm -c disconnect c0::dsk/c0t2d0
cfgadm: Hardware specific failure: operation not supported for SCSI device

So that's the culprit, I think. A disconnect is implicit when doing a zpool offline?

Not a good error to throw back to the user, either.

April 8, 2009

I've been meaning to blog this for a while. Very useful in Jumpstart finish scripts.

eeprom console=ttyb
eeprom ttyb-mode="115200,8,n,1,-"
echo "name=\"asy\" parent=\"isa\" reg=1, 0x2f8 interrupts=3;" >> /kernel/drv/asy.conf
svccfg -s system/console-login setprop ttymon/label = 115200
svcadm refresh system/console-login
svcadm restart system/console-login
perl -pi -e 's/^splashimage/#splashimage/' /rpool/boot/grub/menu.lst
perl -pi -e 's/$ZFS-BOOTFS$/$ZFS-BOOTFS,console=ttyb/' /rpool/boot/grub/menu.lst
bootadm update-archive

reboot

April 16, 2009

A nice high-level writeup by OmniTI's Mark Harrison on Zones, ZFS, and Zetaback.

[via Theo S.]

July 1, 2009

Someone on Sun managers asked for advice on moving from Linux to Solaris and tips on living with Solaris in general. I guess I kind of have a lot to say about it, actually..

One thing I forgot to mention is using SMF. You may have two software repositories (Sun's and pkgsrc), but you only want one place to manage the actual services. Write SMF manifests! It's easy, and you can use puppet to manage it all.

From: Bryan Allen <bda@mirrorshades.net>
To: Jussi Sallinen
Cc:
Bcc:
Subject: Re: Looking for tips: Migrating Linux>Solaris10
Reply-To: bda@mirrorshades.net
In-Reply-To: <20090624113312.GA32749@unikko>
AIM: packetdump

+------------------------------------------------------------------------------
| On 2009-06-24 14:33:12, Jussi Sallinen wrote:
|
| Im new to Solaris and about to start migrating Linux (Gentoo) based E450 server
| to V240 Solaris 10.
|
| Currently running:
|
| -Apache2
| -Postfix
| -Dovecot
| -MySQL
|
| About 70 users using WWW and email services.
|
| So, to the point:
| In case you have tips and tricks, or good to know stuff please spam me with
| info regarding migration.

A quick note: I work for a company where I migrated all our services from Linux
on whiteboxes to Solaris 10 on Sun hardware. It was a major effort, but
garnered us many benefits:

* Consolidation. Thanks to the faster harder and Zones, we are down from 50+
Linux boxes to a dozen Sun systems. And for honestly not that much money.
* Much greater introspection (not just only mdb or DTrace; the *stat tools are
just that much better)
* Before ZFS, we were mostly sitting on reiserfs (before my time) and XFS
(which I migrated as much as I could to before getting it on ZFS). ZFS has
been a huge, huge win in terms of both reliability and availability.

This turned out to be quite an article, but here are some "quick" thoughts on
using Solaris particularly, and systems administration in general:

* Read the System Administrator Guides on docs.sun.com if you are new to
Solaris
* No, seriously. Go read them. They are incredibly useful and easy to parse.
* Follow OpenSolaris development, either via the mailing lists or #opensolaris
on freenode. This gives you a headsup and stuff that might be getting into
the next Solaris 10 Update, so you can plan accordingly.

* Use a ZFS root instead of UFS (text installer only, but you really want to
use JET -- see below)
* Use rpool for operating system and zoneroots only
* Set up a tank pool on seperate disks
* Delegate tank/filesystems to zones doing the application work

This minimizes the impact of random I/O on the root disks for data and vice
versa (just a good practice in general, but some people just try to use a
single giant pool).

It also negates the issue where one pool has become full and is spinning
platters looking for safe blocks to write to impacting the operating system or
application data.

* Use Marin Paul's pca for patching

The Sun patching tools all suck. pca is good stuff. You get security and
reliability patches for free from Sun; just sign up for a sun.com account.

You don't usually get new features from free patches (you do for paid patches),
but regardless all patches are included in the next system Update.

* Learn to love LiveUpgrade

With ZFS roots, LiveUpgrade became a lot faster to use. You don't have a real
excuse anymore for not building an alternative boot environment when you are
patching the system.

Some patches suck and will screw you. Being able to reboot back into your
previous boot environment is of enormous use.

* Use NetBSD's pkgsrc

Solaris 10 lacks a lot of niceties you and your users are going to miss.
screen, vim, etc. You can use Blastwave, but it has its own problems. pkgsrc
packages will compile basically everything without a problem; they are good
quality, easy to administer, and easy to upgrade.

If you aren't doing this on a single box, but several machines, you would have
a dedicated build zone/host, and use PKG_PATH to install the packages on other
systems. Since you are using a single machine, see below about loopback
mounting the pkgsrc directory into zones: Compile once, use everywhere.

The services you listed are available from pkgsrc and work fine. The one thing
you might want to consider instead is using Sun's Webstack and the MySQL
package, as they are optimized for Solaris and 64bit hardware.

In addition to the above, we use pkgsrc on our (dwingling number of) remaining
Linux hosts. It means we have a *single version* of software that may be
running on both platforms. It segments the idea of "system updates" and
"application updates" rather nicely with little overhead.

* Use Solaris Zones

Keep the global zone as free of user cruft as possible. If you segment your
services and users properly, zones make it incredibly easy to see what activity
is going on where (prstat -Z).

It also makes it easy to manage resources (CPU, RAM) for a given set of
services (you can do this with projects also, but to me it's easier to do at
the zone level).

Install all your pkgsrc packages in the global zone and loopback mount it in
each zone. This saves on space and time when upgrading pkgsrc packages. It also
means you have one set of pkgsrc packages to maintain, not N. It's the same
concept as...

* Use Sparse Zones

They are faster to build, patch and manage than full root zones. If you have
recalcitrant software that wants to write to something mounted read-only from
the global zone, use loopback mounts within the global zone to mount a zfs
volume read-write to where it wants (e.g., if something really wants to write
to /usr/local/yourface).

I also install common software in the global zone (e.g., Sun's compiler,
Webstack or MySQL) and then loopback mount the /opt directory into each zone
that needs it (every zone gets SSPRO).

* Delegate a ZFS dataset to each zone

This allows the zone administrator to create ZFS filesystems inside the zone
without asking the global admin. Something like rpool/zones/www1/tank. It's
easier to manage programmically too, if you are using something like Puppet
(see below) to control your zones. You only have to edit a single class (the
zones) when migrating the zone between systems.

* Use ZFS Features

No, really. Make sure your ZFS pools are in a redundant configuration! ZFS
can't automatically repair file errors if it doesn't have another copy of the
file.

But: ZFS does more for you than just checksumming your data and ensuring it's
valid. You also have compression, trivial snapshots, and the ability to send
those snapshots to other Solaris systems.

Writing a script that snapshots, zfs sends | ssh host zfs recvs is trivial. I
have one in less than 50 lines of shell. It gives you streaming, incremental
backups with basically no system impact (depending on your workload,
obviously).

Note that if disk bandwidth is your major bottleneck, enabling compression can
give you a major performance boost. We had a workload writing constantly
rewriting 30,000 sqlite databases (which reads the file into memory, creates
temp files, and writes the entire file to disk -- which are between 5MB and
2GB). It was incredibly slow until I enabled compression, which gave us a 4x
write boost.

You can also delegate ZFS filesystems to your users. This lets them take a
snapshot of their homedir before they do something scary, or whatever.

* Use the Jumpstart Enterprise Tool

Even though you only have one Solaris system, if you're new to Solaris, the
chances are you're going to screw up your first couple installs. I spent months
trying to get mine just the I wanted. And guess what, installing Solaris is
time-consuming and boring.

Using JET (a set of wrappers around Jumpstart, which can also be annoying to
configure), you have a trivial way of reinstalling your system just the way you
want. I run JET in a virtual machine, but most large installs would have a
dedicated install VLAN their install server is plugged into.

Solaris installs have a concept of "clusters", which define which packages are
instaled. I use RNET, the smallest one. It basically has nothing. I tell JET to
install my extra packages, and the systems are configured exactly how I want.

You use the finish scripts to do basic configuration after the install, and
to configure the *rest* of the system and applications, you...

* Use a centralized configuration management tool

I use Puppet. It makes it trivial to configure the system programmically,
manager users and groups, and install zones. It's a life and timesaver. In
addition to making your system configuration reproducible, it *documents* it.

Puppet manages both our Solaris and Linux boxes, keeping each in a known,
documented configuration. It's invaluable.

I also store all my user skel in source control (see next), and distribute them
with Puppet. Users may be slightly annoyed that they have to update the
repository whenever they want to change ~/.bash_profile, but it will be the
same on *every* host/zone they have access to, without them doing any work,
which will make them very happy.

* Store your configs in a source control manager

Both your change management and your system configuration should all be
versioned. Usefully, you can use your change management to manage your system
configs!

We have an internal directory called /sw where we deploy all our software to.
Older services have configs hard-coded to other locations, so we use Puppet to
ensure symlinks exist as appropriate. We deploy to /sw with a script that
checks the tree out of git and rsyncs it to all machines. It's pretty trivial,
and very useful if you have more than, say, two hosts.

/sw is also a loopback mount into every zone, and read-only. It enforces the
idea that all config changes must go into the repository, *not* be changed
locally... because developers can't write to /sw just to fix something quickly.

* Solaris Sucks At: Logging, by default

The default logging setup is awful. Install syslog-ng from pkgsrc, and write
your logs to both a remote syslog server and the local disk (enable compression
on your logs ZFS filesystem!)

* Solaris Sucks At: Firewalling

ipf is a pain in the butt. Unless you absolutely have to do host-based
firewalling, set up an OpenBSD system and use pf.

...

I'm sure I could think of quite a lot more (DTrace, Brendan Gregg's DTrace
Toolkit, RBAC, mdb), but it's dinnertime. :)

Hopefully the above will prove somewhat useful!
--
bda
cyberpunk is dead. long live cyberpunk.


August 14, 2009

Co-worker asked for this. After a few minutes poking at the Makefile, I just googled and hit this page which gave me what I needed.

Yay for lazyweb.

August 29, 2009

Our build files live on a Solaris 10 NFS server. The build client lives in a zone on a separate host. The build files are exported via v3 and tcp to the client.

Periodically the client would hang and require a zone reboot. Needless to say, this was astoundingly annoying if you didn't realize it had hung until you had started your build or push processes. An init 6 always fixed it... for a while.

Looking at snoop on the NFS server, it looked like a bunch of tcp:664 packets came in and go... nowhere. They hit the interface and vanish. Gee, I thought. That's odd.

Finally I got sick of this, and Googled around and found some references to port 623, a Linux bug that sounded pretty similar, and other Solaris users experiencing the same problem.

The first post is really the most useful. Different port, but same behavior.

After creating the rmcp dummy service in inetd, and restarting the zone, the problem has not resurfaced.

It's pretty interesting that this particular bug manifests because a chip on the motherboard eats traffic silently. "Interesting", anyway.