"That which is overdesigned, too highly specific, anticipates outcome; the anticipation of outcome guarantees, if not failure, the absence of grace."
-- William Gibson, All Tomorrow's Parties
Adventures on the Sun.

For the last week and a half I've been learning up on Solaris 10 again. The last time I touched it, about a year ago, I was just screwing around with no real interest in using it in a production environment. After reading a few posts over at Theo Schlossnagle's blog regarding choosing Solaris over Linux and his OSCON slides relating to the same, both relating to PostgreSQL performance, I became much more interested in Solaris 10.

(hdp made noises about how evidently the company Schlossnagle works for wrote OmniMTA, which is what the Cloudmark Gateway product uses, among other things; evidently it's a small enough world, after all.)

We have a service at work which stores spam for 30 days. We refer to the messages as "discards", because the system has decided you probably don't want to see them, but it's not like we're going to drop the things on the floor. The thing is, it's insanely slow, right to the very edge of usability (and probably beyond for the vast majority of people). Getting results out of the database takes minutes.

There are a number of issues with the system as a whole, but evidently Postgres configuration is not one of them (jcap, my predecessor, set the service up properly, and a PgSQL Guy agreed there wasn't much else could be done on the service end). So that leaves hardware and OS optimizations. The hardware is fine, save for the fact it's running on metadisk, which is running on SATA (read: bloody slow, especially for PgSQL, which is a pig for disk I/O). We'll be fixing that with a SCSI JBOD and a nice SCSI RAID card RSN. The OS is Linux, and has also been optimized... to a point. Screwing with the scheduler would probably get me something. However, based on my own research (I've read both the Basic Administration and Advanced Administration books over at docs.sun, as well as numerous websites, etc), and Schlossnagle's posts, I've made up my mind that Solaris is the way to go here. So what sold me?

Well, there's the standard features all new to Solaris 10:

  • ZFS
  • StorageTek Availability Suite (I can't seem to get away from network block devices... we use DRBD right now, and frankly I've really come to hate it; but the basic idea is sound enough and far too useful to ignore)
  • Fault Management
  • Zones (not very useful to me in this case)
  • Service Management Facility (while not a deal-breaker or maker, it's incredibly nice being able to define service dependencies and milestones, it also ties into FM)
  • DTrace (for me, this is a deal-maker; check out the DTraceToolkit for examples why, compared to debugging problems under Linux, it's a huge win for any admin)
  • Trusted Extensions (while really interesting and hardcore, not something I care much about just yet)
  • Stability (not only in terms of the system itself, but the APIs, ABIs, and the like; you can use any device driver written for the last three major versions of Solaris in 10 -- compare not only the technology there, but the philosophy behind it, to any freenix)
  • RBAC (while not something I'm going to use immediately, it's something that I really want to utilize moving forward)

That's a fair feature-set that should get any admin to perk up and take notice. Of course, if it weren't for OpenSolaris I wouldn't care. Solaris 8 and 9 are sturdy and well-known systems, but I have no interest in them. They don't get me anything except service contracts and annoying interfaces. With OpenSolaris, Sun is actively making progress in the same friendly directions freenixes have always tried for -- while adding some seriously engineered and robust tech into the mix. It's a nice change. A more open development model, with lots of incremental releases (building into an official Solaris 10 release every six months or so) give me the warm fuzzies.

So, now that the advertisement is out of the way, what are my impressions after a week of configuring and using it?

Well, Solaris with great new features is still Solaris. Config files are in strange places for legacy reasons, there are symlinks to binaries (or binaries) in /etc, SunSSH is ... well. SunSSH (perhaps sometime soon they'll just switch over to OpenSSH and all the evil politicking can be forgotten, yes?). /home is not /home because it's really /export/home.

Commands exist in odd locations that aren't in my path by default, logging is strange. In short, it's a commercial UNIX. It's steeped in history and the reasons for things are not always immediately clear. The documentation (both from docs.sun, OpenSolaris, and the man pages) is excellent. I am not coming to Solaris as a total newb. I've used it before, but not particularly extensively; the learning curve is expectedly high.

As always, UNIX is UNIX. Nothing changes but the dialect and where they put the salad fork.

So, I've got this core system that does lots of really great stuff, some of which is confusing and maybe not so great, but overall it's a pretty obvious win. Unfortunately it has a bunch of tools I'm not used to, or don't like, and it lacks a lot of tools I require. So I need to go out and find a third-party package management utility. Well, you've got Sun Freeware, which is pretty basic. There's Blastwave, which has a large repository of software, a trivial way of interfacing with it all, but seems to have some QA issues (that's an old impression and may have become invalidated).

And then there's pkgsrc, the NetBSD Ports System. And you know what? It's pretty great. After bootstrapping pkgsrc's gcc from Sun Freeware's (Sun packages gcc now, so you have access to a compiler with the OS -- this was not true before -- but apparently Sun's gcc is Weird and not to be trusted), I was building packages with no issues whatsoever. OpenSSH, Postfix, PgSQL, vim7... Anyway, with an hour's worth of work (which only ever need be done once, on one system, to build the packages), you've got all the programs you're used to using, or require. Suddenly the weird and craggy vastness of Solaris -- expat from the world of commercial UNIX -- becomes much more friendly and livable.

A couple simple hints about your environment: Set TERM=dtterm and add TERMINFO=/usr/share/lib/terminfo. The former seems to be the proper shell for xterm or xterm-like terminals, and the latter fixes pgup/pgdown in pagers and vim, though not in Sun's vi.

It's also easy to create your own packages -- something we've been wanting to do at work for a long time (before I started, certainly). Moving our current custom "packaging" system to pkgsrc would be tedious, but certainly something we could automate with some work. Standardizing on it would be a big win not just for the Solaris servers, but for the architecture as a whole. So, a double win.

(I would be remiss not to mention Nexenta, a project which merges GNU software into OpenSolaris's base. It's very, very interesting, especially in that they use Ubuntu's repos, but regardless of the purported stability of their alpha releases, I can't say I am very interested in running it on my servers. Still, it's definitely something someone who wants to give Solaris 10 a poke without too much effort should take a look at. The same way that Ubuntu is there for people who want to try out Linux. I imagine, frankly, that eventually they will occupy the same ecological niche.)

As you might have guessed, I'm quite happy with my week and change of testing. All the basic management is similar to my BSD experience, and the vast wealth of information I can trivially get out of the system compared to other UNIXes makes it hard to argue against. pkgsrc means not only I, but our developers, have access to everything they need or are used to. The default firewall is ipf, which I'm not thrilled about (pf is my bag), but is certainly usable, and no doubt an improvement over whatever came before.

My next step is to take a Reduced Networking installation and build it up into a Postgres server running Slony-1 for replication services. I expect it to go pretty smoothly. The step after that will be a JumpStart server to live next to my (now beloved) FAI server.

There are a few things I need to pursue before we roll to live testing, including root RAID (the X2100's nvidia "RAID" is not supported by Solaris 10, weirdly enough). ZFS root is apparently possible, though tedious and weird. It would give me a trivial way to do mirroring, though. A install server would probably make it easier to do (though that's just a guess). Barring that, I'm guessing that a ZFS pool for system volumes (/usr, /var) and a ZFS pool for data/applications would be good enough. Mirroring / in the typical way (which certainly appears to be simple), until ZFS on root becomes common and supported.

I expect I'll drop some more updates as I move forward. Hopefully with good news for our PgSQL performance. ;-)

<bda> "Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS will detect it and use another copy to repair it."
* bda sighs.
<bda> It's so dreamy.
<kitten-> You really need a girlfriend.
<bda> I doubt she'd come with fault management and a scriptable kernel debugger.
<kitten-> I suppose you're right.

February 5, 2007 12:05 AM