"That which is overdesigned, too highly specific, anticipates outcome; the anticipation of outcome guarantees, if not failure, the absence of grace."
-- William Gibson, All Tomorrow's Parties
August 11, 2004

Got bored today and decided to install Solaris 10 beta 5 on some boxes. Keeping in mind that my experiences with commercial UNIX has always left a sour taste in my mouth (IRIX, AIX), and that I have very specific ideas about what UNIX is, you greybeards may want to take this with a shot of J.D. or something. Also keep in mind that this is beta software.

February 21, 2006

Installed Solaris 10 on my Dell PowerEdge 1400SC the other day. Just got around to logging into it. Man, I don't remember anything about Solaris except ps -ef. Pretty sad. It took a damn long time to get installed, and did not enjoy playing with my cheaper, far more generic, bitch system. Had to swap the PE out of being a fileserver so I could get Solaris installed.

Going to be playing with the Sun LDAP server, methinks. If Solaris Zones actually did virtualization, I would consider using them for some virtual server stuff I'm going to need to do, but pity, they aren't. Still, they're pretty awesome. I just don't know what I'd want to give someone a shell on a Solaris box for. ;-)

(Note: Image does not actually inspire confidence.)

March 25, 2006

I guess almost two months ago now I started playing around with Solaris 10. I spent a lot of time reading up on it, and even ordered a SunFire X2100 because I figured I might actually want to run it in production (things like Zones, DTrace, SMF, etc, just are that awesome). I probably wouldn't have thought of getting the SunFire, but mdxi was very positive about his experience with the machine.

As I get older, I seem to notice the noises computers in my room makes more and more. Whenever my fileserver (which is across the room -- but it's a really small room) runs it cron jobs at night (which are all very heavy I/O) it sounds like a small motor is chunking through a duck or something. Last night I finally got tired enough of it that I cleaned out my closet with the intention of moving any machines that don't require a display in there. I also figured I would take the opportunity to do something productive with Solaris 10: Replace my current OpenBSD Samba server.

February 5, 2007

For the last week and a half I've been learning up on Solaris 10 again. The last time I touched it, about a year ago, I was just screwing around with no real interest in using it in a production environment. After reading a few posts over at Theo Schlossnagle's blog regarding choosing Solaris over Linux and his OSCON slides relating to the same, both relating to PostgreSQL performance, I became much more interested in Solaris 10.

(hdp made noises about how evidently the company Schlossnagle works for wrote OmniMTA, which is what the Cloudmark Gateway product uses, among other things; evidently it's a small enough world, after all.)

We have a service at work which stores spam for 30 days. We refer to the messages as "discards", because the system has decided you probably don't want to see them, but it's not like we're going to drop the things on the floor. The thing is, it's insanely slow, right to the very edge of usability (and probably beyond for the vast majority of people). Getting results out of the database takes minutes.

There are a number of issues with the system as a whole, but evidently Postgres configuration is not one of them (jcap, my predecessor, set the service up properly, and a PgSQL Guy agreed there wasn't much else could be done on the service end). So that leaves hardware and OS optimizations. The hardware is fine, save for the fact it's running on metadisk, which is running on SATA (read: bloody slow, especially for PgSQL, which is a pig for disk I/O). We'll be fixing that with a SCSI JBOD and a nice SCSI RAID card RSN. The OS is Linux, and has also been optimized... to a point. Screwing with the scheduler would probably get me something. However, based on my own research (I've read both the Basic Administration and Advanced Administration books over at docs.sun, as well as numerous websites, etc), and Schlossnagle's posts, I've made up my mind that Solaris is the way to go here. So what sold me?

Well, there's the standard features all new to Solaris 10:

  • ZFS
  • StorageTek Availability Suite (I can't seem to get away from network block devices... we use DRBD right now, and frankly I've really come to hate it; but the basic idea is sound enough and far too useful to ignore)
  • Fault Management
  • Zones (not very useful to me in this case)
  • Service Management Facility (while not a deal-breaker or maker, it's incredibly nice being able to define service dependencies and milestones, it also ties into FM)
  • DTrace (for me, this is a deal-maker; check out the DTraceToolkit for examples why, compared to debugging problems under Linux, it's a huge win for any admin)
  • Trusted Extensions (while really interesting and hardcore, not something I care much about just yet)
  • Stability (not only in terms of the system itself, but the APIs, ABIs, and the like; you can use any device driver written for the last three major versions of Solaris in 10 -- compare not only the technology there, but the philosophy behind it, to any freenix)
  • RBAC (while not something I'm going to use immediately, it's something that I really want to utilize moving forward)

That's a fair feature-set that should get any admin to perk up and take notice. Of course, if it weren't for OpenSolaris I wouldn't care. Solaris 8 and 9 are sturdy and well-known systems, but I have no interest in them. They don't get me anything except service contracts and annoying interfaces. With OpenSolaris, Sun is actively making progress in the same friendly directions freenixes have always tried for -- while adding some seriously engineered and robust tech into the mix. It's a nice change. A more open development model, with lots of incremental releases (building into an official Solaris 10 release every six months or so) give me the warm fuzzies.

So, now that the advertisement is out of the way, what are my impressions after a week of configuring and using it?

Well, Solaris with great new features is still Solaris. Config files are in strange places for legacy reasons, there are symlinks to binaries (or binaries) in /etc, SunSSH is ... well. SunSSH (perhaps sometime soon they'll just switch over to OpenSSH and all the evil politicking can be forgotten, yes?). /home is not /home because it's really /export/home.

Commands exist in odd locations that aren't in my path by default, logging is strange. In short, it's a commercial UNIX. It's steeped in history and the reasons for things are not always immediately clear. The documentation (both from docs.sun, OpenSolaris, and the man pages) is excellent. I am not coming to Solaris as a total newb. I've used it before, but not particularly extensively; the learning curve is expectedly high.

As always, UNIX is UNIX. Nothing changes but the dialect and where they put the salad fork.

So, I've got this core system that does lots of really great stuff, some of which is confusing and maybe not so great, but overall it's a pretty obvious win. Unfortunately it has a bunch of tools I'm not used to, or don't like, and it lacks a lot of tools I require. So I need to go out and find a third-party package management utility. Well, you've got Sun Freeware, which is pretty basic. There's Blastwave, which has a large repository of software, a trivial way of interfacing with it all, but seems to have some QA issues (that's an old impression and may have become invalidated).

And then there's pkgsrc, the NetBSD Ports System. And you know what? It's pretty great. After bootstrapping pkgsrc's gcc from Sun Freeware's (Sun packages gcc now, so you have access to a compiler with the OS -- this was not true before -- but apparently Sun's gcc is Weird and not to be trusted), I was building packages with no issues whatsoever. OpenSSH, Postfix, PgSQL, vim7... Anyway, with an hour's worth of work (which only ever need be done once, on one system, to build the packages), you've got all the programs you're used to using, or require. Suddenly the weird and craggy vastness of Solaris -- expat from the world of commercial UNIX -- becomes much more friendly and livable.

A couple simple hints about your environment: Set TERM=dtterm and add TERMINFO=/usr/share/lib/terminfo. The former seems to be the proper shell for xterm or xterm-like terminals, and the latter fixes pgup/pgdown in pagers and vim, though not in Sun's vi.

It's also easy to create your own packages -- something we've been wanting to do at work for a long time (before I started, certainly). Moving our current custom "packaging" system to pkgsrc would be tedious, but certainly something we could automate with some work. Standardizing on it would be a big win not just for the Solaris servers, but for the architecture as a whole. So, a double win.

(I would be remiss not to mention Nexenta, a project which merges GNU software into OpenSolaris's base. It's very, very interesting, especially in that they use Ubuntu's repos, but regardless of the purported stability of their alpha releases, I can't say I am very interested in running it on my servers. Still, it's definitely something someone who wants to give Solaris 10 a poke without too much effort should take a look at. The same way that Ubuntu is there for people who want to try out Linux. I imagine, frankly, that eventually they will occupy the same ecological niche.)

As you might have guessed, I'm quite happy with my week and change of testing. All the basic management is similar to my BSD experience, and the vast wealth of information I can trivially get out of the system compared to other UNIXes makes it hard to argue against. pkgsrc means not only I, but our developers, have access to everything they need or are used to. The default firewall is ipf, which I'm not thrilled about (pf is my bag), but is certainly usable, and no doubt an improvement over whatever came before.

My next step is to take a Reduced Networking installation and build it up into a Postgres server running Slony-1 for replication services. I expect it to go pretty smoothly. The step after that will be a JumpStart server to live next to my (now beloved) FAI server.

There are a few things I need to pursue before we roll to live testing, including root RAID (the X2100's nvidia "RAID" is not supported by Solaris 10, weirdly enough). ZFS root is apparently possible, though tedious and weird. It would give me a trivial way to do mirroring, though. A install server would probably make it easier to do (though that's just a guess). Barring that, I'm guessing that a ZFS pool for system volumes (/usr, /var) and a ZFS pool for data/applications would be good enough. Mirroring / in the typical way (which certainly appears to be simple), until ZFS on root becomes common and supported.

I expect I'll drop some more updates as I move forward. Hopefully with good news for our PgSQL performance. ;-)

<bda> "Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS will detect it and use another copy to repair it."
* bda sighs.
<bda> It's so dreamy.
<kitten-> You really need a girlfriend.
<bda> I doubt she'd come with fault management and a scriptable kernel debugger.
<kitten-> I suppose you're right.

February 8, 2007

The LSI SCSI card for our new discards database server (Sun X2100 M2) came in today. I ran up to UniCity to get some cables from Harry (thanks, Harry!) and plugged the Dell PowerVault 210S we got into the thing. All the disks showed up happily in cfgadm -al, but not in format. Telling cfgadm to configure didn't seem to do much for me, so I decided to be lazy and touch /reconfigure and reboot. Once that was done, all seemed to be quite happy.

The goal here is to create a ZFS pool on the JBOD for Postgres to live on. As you can see, it is super easy:

[root@mimas]:[~]# zpool create data2 raidz c3t0d0 c3t1d0 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0 c3t13d0
[root@mimas]:[~]# zpool status
pool: data
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
c1d0p3 ONLINE 0 0 0
c2d0p3 ONLINE 0 0 0

errors: No known data errors

pool: data2
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data2 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t8d0 ONLINE 0 0 0
c3t9d0 ONLINE 0 0 0
c3t10d0 ONLINE 0 0 0
c3t11d0 ONLINE 0 0 0
c3t12d0 ONLINE 0 0 0
c3t13d0 ONLINE 0 0 0

errors: No known data errors
[root@mimas]:[~]# zfs create data2/postgresql
[root@mimas]:[~]# zfs set mountpoint=/var/postgresql2 data2/postgresql
[root@mimas]:[~]# mv /var/postgresql/* /var/postgresql2/
[root@mimas]:[~]# zfs umount data/postgresql
[root@mimas]:[~]# zfs destroy data/postgresql
[root@mimas]:[~]# zfs umount data2/postgresql
[root@mimas]:[~]# zfs set mountpoint=/var/postgresql data2/postgresql
[root@mimas]:[~]# zfs mount data2/postgresql
[root@mimas]:[~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 885M 124G 24.5K /data
data/pkg 885M 124G 885M /usr/pkg
data2 17.1G 346G 44.8K /data2
data2/postgresql 17.1G 346G 17.1G /var/postgresql

RAID-Z is explained in the zpool man page.

While I am a little worried that we really want hardware RAID, I didn't want to spend the extra cash. If we do end up needing the extra performance, trust me, the system can be repurposed easily. ;-)

woot.

February 10, 2007

In which our intrepid sysadmin may as well have been eating wall candy as doing work for all it got him.

I spent all of Friday and most of last night trying to figure out why pgbench was giving me what seemed to be totally insane results.

  • >1000 tps on SATA 3.0Gb/s (Linux 2.6, XFS, metadisk + LVM)
  • ~200 tps on SATA 3.0Gb/s (Solaris 10 11/06, UFS)
  • ~500 tps on SATA 3.0Gb/s (Solaris 10 11/06, ZFS mirror)
  • <100 tps on 10x U320 SCSI (Solaris 10 11/06, ZFS RAID-Z)

Setting recordsize=8192 for the ZFS volume is helpful. Or whatever your PostgreSQL blocksize is compiled with.

It basically came down to Postgres fsync() absolutely murdering ZFS's write performance. I couldn't understand why XFS would perform 10 times as well as ZFS on blocking writes. Totally brainlocked.

And of course the bonnie++ benchmarks were exactly what I expected: the SCSI array was kicking ass all over the place... except with -b enabled, while the lone SATA drive just sort of sucked air next to it on any OS/filesystem.

The zpool iostat <pool> <interval> command is seriously useful. Give it -v to get per-disk stats.

Anyway, I was banging my head against the wall literally the entire day. The ticket for "Test PgSQL on Sol10" is more pages long than I'd like to think about. Finally I bitched about it in #slony, not out of any particular thought that I'd get an answer, just frustration. mastermind pointed out that, hey, cheap SATA drives have, you know... write caches. And like? SCSI drives? They have it like, turned off by default, because reliability is far more important than performance for most people. (Valley-girlese added to relate how stupid I felt once he mentioned it.) And anyway, that's what battery-backed controllers are for: So you get the perf and reliability.

Once he said "write cache", it clicked, total epiphany, and I felt like a complete jackass for spending more than ten hours fighting with what amounts to bad data. In the process I read a fair amount on good database architecture, though, and that will be very helpful in the coming weeks for making our discards database(s) not suck.

Live and learn, I suppose.

The whole experience has really driven home the point that getting information out of a Solaris box is so much less tedious than a Linux box, though. While I was under the impression that the SATA drives were not actually lying to me about their relative performance, I kept thinking "These Sol10 tools are so frakking nice, I don't want to get stuck with Linux tools for this stuff!" Especially the ZFS interaces.

February 22, 2007

OpenSolaris/Solaris Relationships

A useful entry for people who are confused by how OpenSolaris turns into Solaris 10, and what the differences actually are.

Mercurial: a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.

dragorn pointed this out the other day. After having to endure the fist-shaking of my co-workers at svn/svk perhaps they will be satiated. Or not.

OpenSolaris also uses hg, it seems.

March 14, 2007

So as part of freeing up some rackspace at work, I'm throwing a bunch of systems into Solaris Zones. However, some of these systems, while not "mission critical" are pretty important and their IP addresses really shouldn't change (DNS propagation lag would suck).

So my Solaris Zones box is sitting on one our subnets at the colo, the one with the most free addresses. Two of these other systems, however, are on another subnet. There's no good way to currently add a default route for a local zone when the global zone is not also part of that network. I could either waste an IP in that subnet (which I don't want to do), or follow this suggestion and ghetto-hack around it:


[root@chironex]:[~]# cat /etc/hostname.nge0\:99
0.0.0.0
[root@chironex]:[~]# ifconfig nge0:99 plumb up
[root@chironex]:[~]# ifconfig -a
nge0:99: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 0.0.0.0 netmask ff000000
[root@chironex]:[~]# zonecfg -z ircd info
zonename: ircd
zonepath: /export/zones/ircd
autoboot: true
pool:
limitpriv:
fs:
dir: /opt
special: /opt
raw not specified
type: lofs
options: [ro,nodevices]
net:
address: A.B.C.D
physical: nge0
[root@chironex]:[~]# ifconfig nge0:99 A.B.C.D netmask A.B.C.248
[root@chironex]:[~]# route add default 1.2.3.4
add net default: gateway 1.2.3.4
[root@chironex]:[~]# ifconfig nge0:99 0.0.0.0 netmask 255.0.0.0
[root@chironex]:[~]# zoneadm -z ircd boot
[root@chironex]:[~]# ifconfig -a
nge0:5: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
zone ircd
inet A.B.C.D netmask fffffff8 broadcast 1.2.3.5

Works just fine, though.

(If it came down to some network-contention problems, I could pull the same trick on bge0, another physical device in the system... but it won't.)

March 16, 2007
March 17, 2007

Picked up off osnews. Yeah, I should know better. I might catch something.

Solaris 10 11/06 (Update 3) Review

Solaris Express is coming along; and for those who do want bleeding edge, ultra-super-duper features, then Solaris probably isn't your best bet, then again, assuming you're into that stuff, you'd be better catered for by the likes of Gentoo for example - for those of us who would prefer to have stability above features, then give Solaris a go and if you can make a contribution to Solaris by way of code contributions, then by all means do so.

Recommending Gentoo over Solaris.

Gentoo.

To stay in context, though, he is talking about the desktop market. So why did he review Solaris 10? Why not Nexenta, which is geared to for exactly that?

And Gentoo. Instead of say... Ubuntu.

Also, Solaris lacking features that Linux has? That's a bloody joke and a half:

Whatever.

There was this noise a minute ago? Weird wooshing sound? Right over someone's head.

This has been your monthly blog rant against some other blog post some blogging guy somewhere wrote about something.

April 16, 2007

Live Upgrade for Idiots

Huzzah, I was going to start looking for something like that. :-)

Added that site to my feeds as well. woop.

[via stevel]

May 18, 2007

I've been spending a lot of time working on consolidating our services. It's tedious, because we have been a Linux shop since the company was started: there are many Linux and GNUisms. I have yet to question the decision to move as much as I can to Solaris 10, however. Consider:

We have, in the past, had two (more or less) dedicated systems for running MySQL backups for our two production databases. These replicas would stop their slave threads, dump, compress, and archive to a central system. Pretty common. But they were both taking up resources and rackspace that could otherwise be utilized.

Enter Solaris Zones. There's no magic code required for mysqldump and bzip2, so moving them was trivial. The most annoying part of building any new MySQL replica is waiting on the import. But, if you admin MySQL clusters you're probably already really, really used to that annoyance.

So I built a new database backup zone to run both instances of MySQL. Creatively, I named it dbbackup. It ran on zhost1 (hostnames changed to protect the innocent). Unfortunately, zhost1 also runs all our development zones (bug tracking, source control, pkgsrc and par builds) as well our @loghost. Needless to say, the addition of two MySQL dbs writing InnoDB pretty much killed I/O (this is on an X2100 M1 with mirrored 500GB Seagate SATA 3.0Gb/s drives), making the system annoying to use.

This week I deployed two new systems to act as zone hosts, one of which is slated for non-interactive use. So last night I brought down the database backup zone and migrated it over.

This document details the process, which is ridiculously trivial. No, really. The most annoying part was waiting on the data transfer (60GB of data is slow anywhere at 0300).

My one piece of extra advice is: Make sure both systems are running the same patch level before you start. PCA makes this pretty trivial to accomplish.

This is a sparse-root zone, but there are two complications:

  • I delegate a ZFS dataset to the zone, so there are a bunch of ZFS volumes hanging off it. However, they all exist under the same root as the zone itself, so it's not really a big deal.
  • I have a ZFS legacy volume set up for pkgsrc. By default pkgsrc lives in /usr/pkg, /usr is not writable since it's a sparse zone, and I don't really want to deal with moving it. It needs to be mounted at boot time (before the lofs site_perl mounts which contain all our Perl modules in the global zone), however, and after a little bit of poking I couldn't figure out how to manipulate zvol boot orders. Legacy volumes get precedence over lofs, though, so. Ghetto, I know.

The volume set up looks like this:

[root@dbbackup]:[~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 85.3G 348G 24.5K /data
data/zones 41.2G 348G 27.5K /export/zones
data/zones/dbbackup 40.4G 348G 135M /export/zones/dbbackup
data/zones/dbbackup/tank 39.9G 348G 25.5K none
data/zones/dbbackup/tank/mysql 39.9G 348G 8.49G /var/mysql
data/zones/dbbackup/tank/mysql/db2 24.5G 348G 24.5G /var/mysql/db2
data/zones/dbbackup/tank/mysql/db1 6.92G 348G 6.92G /var/mysql/db1

So, my process?

First, shut down and detach the zone in question.

[root@zhost1]:[~]# zlogin dbbackup shutdown -y -i0
[root@zhost1]:[~]# zoneadm -z dbbackup detach

Make a recursive snapshot of the halted zone. This will create a snapshot of each child hanging off the given root, with the vanity name you specify.

[root@zhost1]:[~]# zfs snapshot -r data/zones/dbbackup@migrate

Next, use zfs send to write each snapshot'd volumes to a file.

[root@zhost1]:[~]# zfs send data/zones/dbbackup@migrate > /export/scratch/dbbackup@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/pkgsrc@migrate > /export/scratch/dbbackup-pkgsrc\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank@migrate > /export/scratch/dbbackup-tank\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank/mysql@migrate > /export/scratch/dbbackup-tank-mysql\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank/mysql/db2@migrate > /export/scratch/dbbackup-tank-mysql-db2\@migrate
[root@zhost1]:[~]# zfs send data/zones/dbbackup/tank/mysql/db1@migrate > /export/scratch/dbbackup-tank-mysql-db1\@migrate

Now, copy each of the dumped filesystem images to the new zone host (zhost2), using scp or whatever suits you. Stare at the ceiling for two hours. Or catch up on Veronica Mars and ReGenesis. Whichever.

Once that's finished, use zfs receive to import the images into an existing zpool on the new system.

[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup < dbbackup\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/pkgsrc < dbbackup-pkgsrc\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank < dbbackup-tank\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank/mysql < dbbackup-tank-mysql\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank/mysql/db2 < dbbackup-tank-mysql-db2\@migrate
[root@zhost2]:[/export/scratch]# zfs receive data/zones/dbbackup/tank/mysql/db1 < dbbackup-tank-mysql-db1\@migrate

Before I could attach the zone, I needed to set the mountpoints for the dataset and legacy volumes properly.

[root@zhost2]:[~]# zfs set mountpoint=legacy data/zones/dbbackup/pkgsrc
[root@zhost2]:[~]# zfs set mountpoint=none data/zones/dbbackup/tank

Also, since the zone was living on a different network than the host system, I needed to add a default route for that network to the interface. I talked about this earlier, and it's a workaround that should be going away once NIC virtualization makes it into Solaris proper from OpenSolaris (I would guess u5?).

[root@zhost2]:[~]# ifconfig nge0:99 A.B.C.D netmask 255.255.255.224
[root@zhost2]:[~]# route add default A.B.C.1
add net default: gateway A.B.C.1
[root@zhost2]:[~]# ifconfig nge0:99 0.0.0.0 netmask 255.0.0.0

Now, create a stub entry for the zone with zonecfg.

[root@zhost2]:[~]# zonecfg -z dbbackup
dbbackup: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:dbbackup> create -a /export/zones/dbbackup
zonecfg:dbbackup> exit

And that's pretty much it.

Attach the zone and boot it and you're done.

[root@zhost2]:[~]# zoneadm -z dbbackup attach
[root@zhost2]:[~]# zoneadm -z dbbackup boot

Once it was up, I logged in, made sure the MySQL instances were replicating happily and closed the ticket for moving the zone.

This level of flexibility and ease of use is key. In addition to the other technologies included in Solaris 10, you'd be crazy not to be utilizing it. (Even with the annoying bits still lurking in Sol10, it's absolutely worth the effort.)

And it's only going to get better.

May 31, 2007

One of my major blocking tasks right now is to rebuild our Perl tree (about 600 modules) as PARs for easy distribution. I'm not using SRV4 or pkgsrc packages for them because I want to have one build system for both our Linux and Solaris systems. Much of our code relies on version-specific behaviors, so the only way for me to actually get stuff ported (without going totally batshit) from the entrenched Linux systems to the Solaris boxes is to rebuild all those modules, at those versions.

Yesterday, I spent most of the day compiling one Perl module (Math::Pari, which relies on the pari math libraries, and engages me in an indecent amount of skullfuckery whenever I try to build it. This particular adventure into stupidity was caused mainly by braindead pkgsrc dependencies... pari relies on teTeX -- to build its documentation -- which relied on X11. I spent a good portion of time trying to get it working the way it wanted until finally just ripping the X bits out. Guess how long that took. Yup. Bare minutes.). Finally I just built pari by hand to /opt and linked Math::Pari against that. Could have saved hours and hours...

Spent all of today building the rest of our Perl modules. Got down from ~600 to 67. Pulling the modules was easy enough was hdp mentioned the by-module listing on the CPAN. The vast majority of modules were well-behaved; it was simply a matter of iterating over the modules, running

perl Makefile.PL --skipdeps && \
make && make test && \
cd blib \
zip -r $DEST/$MODULE-i386-solaris-5.8.8.par *

and then installing it. Not a big deal. Some of them were tenacious and obnoxious, though, and ate up a lot of time. We can theoretically (this has not proven to be completely true) skip deps as any dependencies should exist in our local tree. I'm sure once I'm done I'll have to check to make sure all the modules actually have their deps, but the vast majority should.

Tomorrow I get to finish this up and then maybe get some working code running in some zones. Huz-freakin'-zah.

June 6, 2007

OpenSolaris: Five updates conservative developers should make

Some really good points listed therein.

I dream about the days when our code all runs happily on Solaris, is packaged up in SRV4 streams, and I can open "add DTrace providers to stuff" tickets...

June 8, 2007

< brendang> # dtrace -n 'zangband:::'
< brendang> dtrace: description 'syscall:::' matched 470 probes
< brendang> CPU ID FUNCTION:NAME
< brendang> 1 20164 pink_jelly:hits_you
< brendang> 1 20164 pink_jelly:hits_you
< brendang> 1 20164 pink_jelly:hits_you
< brendang> 1 20164 character:dies

July 19, 2007
August 24, 2007

< bda> Legacy.
< bda> This is an inherited mess.
< bda> (Which is not much of an excuse, but it's what I've got.)
< dwc-> bda: it's understandable
< dwc-> I've had plenty of uh, legacy ... cruft. to deal with for awhile before it can finally get tossed
< bda> dwc-: The vast majority of the other cruft has been replaced with shiny, maintainable things.
< bda> This site is hours away, I have no easy mode of transit, blah blah blah.
< bda> Soon I will build a pod of Sun boxes and hitchhike up there with a scary old man who insists I hold his dentures for him.
< Tempt> a pod of Sun boxes
< Tempt> Does the pod open and give birth to an army of drones?
< bda> No, an army of zones.
< Tempt> boom-tish

October 4, 2007

After messing around with plain old Jumpstart for a day, I got sick of it and decided to try out Jumpstart Enterprise Toolkit after eryc mentioned it, a bunch of code living on stop of Jumpstart meant to make lives easier. It does. Getting things set up, adding hosts, etc, goes from being kind of tedious to trivial. The real killer for me was dealing with Solaris's DHCP manager. Man, what a weird, annoying thing.

So now I have Jumpstart set up in Parallels on my laptop (that's 30GB I won't be getting back anytime soon), which is a pretty useful thing to have. I suppose next I'll set an FAI VM for those Debian boxes I still haven't replaced...

Here is the HOWTO I used as a starting point, and also the JET wiki.

Someone in #opensolaris yesterday mentioned they had a Debian Etch zone branded zone running. And it looks pretty trivial to do, too.

Derek Crudgington, of Joyent, has a post over on his blog about using DTrace to instrument MySQL (which does not have any DTrace probes). As long as you know the function names you're interested in, you can some really useful information out of it.

The fact that you can get that information, which would typically get you a major performance hit from MySQL itself, without MySQL having to be touched, restarted, or impaired, is just another example of how great DTrace is.

October 9, 2007

Several months ago, after watching Bryan Cantrill's DTrace talk at Google, I went looking for the then-current state of DTrace userstack helpers for Perl. We're a big Perl shop; being able to get ustacks out of Perl would be a pretty major thing for me. I came across a blog post by Alan Burlison who had patched Perl 5.8.8 with subroutine entry/return probes, but couldn't, at the time, find a patch for it. So I forgot about it.

The other day I re-watched that talk and went looking again. Discovering, in the process, that Richard Dawe had reproduced Alan's work and released a diff. Awesome!

So the basic process is this:

  • Get a clean copy of Perl 5.8.8
  • Get Richard's patch
  • Read the instructions in the patch file
    • note that you have to build with a dynamic libperl!
  • Use gpatch to patch the source, and configure Perl as usual
$ cd perl-5.8.8
$ gpatch -p1 -i ../perl-5.8.8-dtrace-20070720.patch
$ sh Configure

Noted by Brendan Gregg, you'll also need to add a perldtrace.o target to two lines in the Makefile (line numbers may differ):

274          -@rm -f miniperl.xok
275          $(LDLIBPTH) $(CC) $(CLDFLAGS) -o miniperl \
276              miniperlmain$(OBJ_EXT) opmini$(OBJ_EXT) $(LLIBPERL) $(libs) perldtrace.o
277          $(LDLIBPTH) ./miniperl -w -Ilib -MExporter -e '' || $(MAKE) minitest
278
279  perl$(EXE_EXT): $& perlmain$(OBJ_EXT) $(LIBPERL) $(DYNALOADER) $(static_ext) ext.libs $(PERLEXPORT)
280          -@rm -f miniperl.xok
281          $(SHRPENV) $(LDLIBPTH) $(CC) -o perl$(PERL_SUFFIX) $(PERL_PROFILE_LDFLAGS) $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(DYNALOADER) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs) perldtrace.o

As the patch instructions state, you'll need to generate a DTrace header file, running:

$ make perldtrace.h
/usr/sbin/dtrace -h -s perldtrace.d -o perldtrace.h
dtrace: illegal option -- h
Usage: dtrace [-32|-64] [-aACeFGHlqSvVwZ] [-b bufsz] [-c cmd] [-D name[=def]]

Ouch, ok, apparently dtrace -h is broken on Solaris 10u3. I mentioned this on #dtrace, and Brendan suggested I find a Perl script posted to dtrace-discuss by Adam Leventhal to emulate dtrace -h behavior.

But I'm lazy and have Solaris 10u4 boxes, so I just generate the header file on one of those and copy it over to the u3 box.

Once you have perldtrace.h in place, run make as normal, get a cuppa, whatever.

As soon as your make is done running, check the patch file for instructions on running a simple test to see if it's working. I have yet to have any issues.

Now, as Alan mentions in his blog, there's a chance you could eat a 5% performance hit. For me, that would be worth it, due to the complexity of our codebase and the fact I am sometimes (though thankfully not recently) called upon to debug something I am wholly unfamiliar with at ungodly hours of the night. Digging around for the problem is hard as adding debugging to running production code is simply not going to happen. With a DTrace-aware Perl, it's simply a matter of crafting proper questions to ask and writing wrappers to make the inquiries.

I'm certainly not at a point where I can do that, but I reckon it won't be long after I've deployed our rebuilt Perl packages that I'll be learning "A is for Apple ... D is for DTrace".

To simply quantify that performance hit, rjbs suggested we run the Perl test suite on various builds. Below I have (again, very simple) metrics on how long each build took to run the tests. As DTrace requires a dynamic libperl, which is going to be a performance hit of some (unknown to me) value, I have both static and dynamic vanilla (no DTrace patch) build times listed.

Build type real/user/sys
Vanilla Perl, static libperl 8m44.880s/3m44.770s/1m41.745s
8m48.657s/3m48.574s/1m41.623s
8m46.513s/3m46.272s/1m41.728s
Vanilla Perl, dynamic libperl 9m41.212s/4m32.217s/1m49.256s
9m57.276s/4m47.755s/1m49.443s
9m43.576s/4m34.341s/1m49.520s
Patched Perl, dynamic libperl, not instrumented 10m17.740s/4m32.825s/1m49.017s
10m16.507s/4m32.982s/1m49.350s
10m22.689s/4m38.937s/1m49.287s

If the tests suite is indeed a useful metric, the hit is certainly not nothin'. I suspect there would be ways to mitigate that hit, though.

As soon as I gain some clue (or beg someone in #dtrace for the answer), I'll run the same tests while instrumenting the Perl processes. Just need to figure out how to do something like

syscall:::entry
/execname == "perl"/
{
  self->follow = 1;
}

perl$1:::sub-entry, perl$1:::sub-return
/self->follow/
{ ... }


when the Perl processes I want to trace are completely ephemeral.

October 10, 2007

Noticing the question in my previous post about ephemeral processes, seanmcg in #dtrace suggested I write something akin to this, which did occur to me, vaguely, as a possibility. But it seemed like far more complexity than I wanted to create, and starting/stopping processes to kick off watchers sounded like a good way to impact performance in an already loaded environment (read: our mailservers). I knew there had to be a better way to do it than wrapping DTrace up in Perl so I could monitor Perl, but I couldn't figure out how to do it with the pid::: provider. Well, you can't. But!

< brendang> the wildcard "*" doesn't work properly for the pid provider, but does work for the USDT language providers
< brendang> most of the language examples in the new DTraceToolkit use perl*:::, mysql*:::, javascript*:::, etc

Obviously DTT should have been the first place I looked, instead of whining. :-)

So if you are trying to follow something specific with the pid:::, seanmcg's method is certainly viable. I just wanted to glob onto all Perl processes, though.

Brendan also offered the following (as I was thinking about it backwards in my previous post):

#!/usr/sbin/dtrace -Zs

perl*:::sub-entry
{
self->sub = copyinstr(arg0);
}

syscall:::entry
/self->sub != NULL/
{
printf("Perl %s() called syscall %s()", self->sub, probefunc);
}

perl*:::sub-return {
self->sub = 0;
}

Start 'er up in Terminal A:


[20071010-00:10:31]:[root@mako]:[~]# ./perlsubs.d
dtrace: script './perlsubs.d' matched 232 probes

Kick off one our simple but venerable helper scripts, with shebang set to the patched Perl:


[20071010-00:10:34]:[root@mako]:[~]# ./spool-sizes.pl -h

usage: spool-sizes.pl [-tabcdimsvh]
-t: global threshold (default = 1000 messages)
-a: active spool threshold (default = $threshold)
-H: hold spool threshold (default = $threshold)
-c: corrupt spool threshold (default = $threshold)
-d: deferred spool threshold (default = $threshold)
-i: incoming spool threshold (default = $threshold)
-n: no mail (do not mail, but create file in /var/tmp/spool-sizes)
-T: add a composite "total" spool
-v: visual (i.e. output to console vs. file and do not mail)
-h: help (this message)

And, back in Terminal A:


CPU ID FUNCTION:NAME
0 40463 stat64:entry Perl BEGIN() called syscall stat64()
0 40463 stat64:entry Perl BEGIN() called syscall stat64()
0 40463 stat64:entry Perl BEGIN() called syscall stat64()
...
0 40097 close:entry Perl BEGIN() called syscall close()
0 40325 systeminfo:entry Perl hostname() called syscall systeminfo()
0 40185 ioctl:entry Perl usage() called syscall ioctl()
0 40467 fstat64:entry Perl usage() called syscall fstat64()
0 40093 write:entry Perl usage() called syscall write()
0 40093 write:entry Perl usage() called syscall write()
0 40093 write:entry Perl usage() called syscall write()
...


And here's the output of Alan B's example script:


[20071010-00:10:41]:[root@mako]:[~]# ./perlsubs2.d
dtrace: script './perlsubs2.d' matched 7 probes
^C
CPU ID FUNCTION:NAME
0 2 :END 2 import /opt/perl/perl5.8.8/lib/5.8.8/warnings.pm
3 import /opt/perl/perl5.8.8/lib/5.8.8/strict.pm
6 BEGIN /opt/perl/perl5.8.8/lib/5.8.8/vars.pm
6 bits /opt/perl/perl5.8.8/lib/5.8.8/strict.pm
11 import /opt/perl/perl5.8.8/lib/5.8.8/AutoLoader.pm
25 import /opt/perl/perl5.8.8/lib/5.8.8/Exporter.pm
26 BEGIN /opt/perl/perl5.8.8/lib/5.8.8/i86pc-solaris/Sys/Hostname.pm
32 load /opt/perl/perl5.8.8/lib/5.8.8/i86pc-solaris/XSLoader.pm
62 AUTOLOAD /opt/perl/perl5.8.8/lib/5.8.8/i86pc-solaris/POSIX.pm
68 BEGIN /opt/perl/perl5.8.8/lib/5.8.8/warnings.pm
85 BEGIN ./spool-sizes.pl
271 PERL PERL

This won't be useful at all. Tomorrow I'm going to try and get back to porting our MX dispatching software to Solaris. hdp says all the tests pass, so it should just be a matter of making sure each of the associated daemons work properly, have manifests, etc.

And then, the fun part: Writing a little something I've been referring to as mailflow.d...

October 13, 2007
October 15, 2007

Based on Albert Lee's howto:


[20071015-08:38:52]:[root@clamour]:[~]# uname -a
SunOS clamour 5.10 Generic_120012-14 i86pc i386 i86pc
[20071015-08:38:53]:[root@clamour]:[~]# zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
3 control running /export/zones/control native shared
4 lunix running /export/zones/lunix lx shared
[20071015-08:38:56]:[root@clamour]:[~]# zlogin lunix
[Connected to zone 'lunix' pts/5]
Last login: Mon Oct 15 12:37:28 2007 from zone:global on pts/4
Linux lunix 2.4.21 BrandZ fake linux i686

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
lunix:~#

After I stop laughing hysterically, visions of collapsing Linux boxes into Solaris zones dancing through my twitching little mind, I'll have to see how twitchy the install itself is. Already it appears that some stuff is unhappy, though most of it seems to revolve around things that don't matter (ICMP oddities, console oddities wrt determing how smart it is for restarting services -sigh- and a few other easily surmountable or ignorable things).

Overall: Hello, awesome.

(Update: It appears that 6591535 makes this a non-starter. I am now, again, a very sad bda with a bunch of crappy hardware and nowhere to move their services to.)

November 1, 2007

So the "OpenSolaris Developer Preview" was released last night. I spent a few minutes with it, and it has generated a frankly ridiculous amount of controversy inside the community (to the point where my inbox has tripled in size). So what's the deal?

Well, it's actually a pretty decent first release. It has ZFS on root (awesome!), the Image Packaging System (which is way cool), and is almost as trivial to install as Ubuntu. People are whining about a bunch of nonsense (wah, the default shell, wah, no KDE in the first release, wah), but by far the biggest complaints center around Indiana taking the OpenSolaris name. This gets a big fat whatever from me.

I'm wondering if anyone complaining has actually read the FAQ.

Dennis Clarke has some screenshots over at Blastwave.

If you are thinking of giving it a try, you should probably read through the immigrants page. benr++

I installed it without issues in a Parallels VM. It panicked on boot a couple times, though I suspect that it more to do with Parallels than Solaris. I need to make an image of the compiler tools so I can get the NIC supported, but that is a pretty trivial thing. I suspect I will not bother and just build at workstation at the office and throw Indiana on there.

Overall I think this is a fairly exciting milestone for OpenSolaris. Their release schedule of every six months is encouraging, as it works very well for certain other high-quality projects. The barrier for adoption has fallen and now that code has been thrown over the wall, perhaps people can start contributing instead of ... not.

Well, once they get over accusing the Indiana guys of "stabbing the community in the back" and eating babies...

November 16, 2007

I have an OpenSolaris box in pilot at the moment, running build 74. It uses Lori Alt's patched miniroot so I can set up a rootpool and do a profile (network, via Jumpstart) install. It works really well.

Yesterday the box went into a reboot loop, and as there appears there are issues with b74, I figured I would finally get around to learning how to use BFU (which is a change-aware wrapper around cpio that writes to /; it's not something you can back out of). But before I did that, I would need to figure how to boot from a ZFS clone. If the BFU goes south, or if the new build bricks the box, I need a way to boot to back into the old system. It's the poor man's LiveUpgrade, I suppose, but it's still way cool and (I think) much easier.

So that's the goal here: Take a snapshot of the current system, clone the snapshot so it's writable, and then upgrade the clone. This way we can BFU the system and still have a fallback in the event that the BFU fails, or the new OS/Net build bricks our box.

Tim Foster had already written a blog post about how easy this was, so I wasn't expected to run into any problems.

First, grab the ON build tools and the BFU archives for the build you care about.


[root@octopus]:[~] cd /tmp
[root@octopus]:[/tmp]# wget http://dlc.sun.com/osol/on/downloads/b75/SUNWonbld.i386.tar.bz2
[root@octopus]:[/tmp]# wget http://dlc.sun.com/osol/on/downloads/b75/on-bfu-nightly-osol-nd.i386.tar.bz2

You probably want to do that in tmp (which is swap) so when you take your snapshots, big random files are not littering the filesystem forever.

Set up your build environment:

[root@octopus]:[/tmp]# bunzip2 on-bfu-nightly-osol-nd.i386.tar.bz2
[root@octopus]:[/tmp]# tar -xf on-bfu-nightly-osol-nd.i386.tar
[root@octopus]:[/tmp]# bunzip2 SUNWonbld.i386.tar.bz2
[root@octopus]:[/tmp]# tar -xf SUNWonbld.i386.tar 
[root@octopus]:[/tmp]# cd onbld/
[root@octopus]:[/tmp/onbld]# pkgadd -d . SUNWonbld 
[root@octopus]:[/tmp/onbld]# cd
[root@octopus]:[~]# export FASTFS="/opt/onbld/bin/i386/fastfs"
[root@octopus]:[~]# export GZIPBIN="/usr/bin/gzip"
[root@octopus]:[~]# export BFULD="/opt/onbld/bin/`uname -p`/bfuld"
[root@octopus]:[~]# export PATH="/opt/onbld/bin:/opt/onbld/bin/`uname -p`:$PATH"

Now we need to take a snapshot of our current rootfs, clone it is writable, and mount it. In my setup, the rootpool is a legacy mount, and anything under it is also going to inherit the legacy mount property.


[root@octopus]:[~]# zfs snapshot rootpool/b74@upgrade
[root@octopus]:[~]# zfs clone rootpool/b74@upgrade rootpool/b75
[root@octopus]:[~]# zfs set mountpoint=/rootpool/b75 rootpool/b75

Now it's time to do the actual upgrade. I ran into two very minor snags here. First, I don't have BIND installed, so I needed to pass -f to bfu. Secondly, I don't have D-BUS installed, and had to comment that check out of the bfu script. Once that's done, it goes off and does it's thing happily.

Once the BFU finished you'll be put into a safe environment with tools built to work regardless of how horribly the BFU may have messed up your system (not an issue here, as we aren't actually modifying our current rootfs). As soon as it's done, you'll need to resolve the conflicts it lists; thus far I have not had an issue with using Automated Conflict Resolution to merge those files.


[root@octopus]:[~]# bfu -f /tmp/archives-nightly-osol-nd/i386 /rootpool/b75
bfu# /opt/onbld/bin/acr /rootpool/b75

And that's it. Your clone has now been upgraded using BFU. Create a boot archive of the new BE and set it legacy again.


[root@octopus]:[~]# bootadm archive-update -R /rootpool/b75
[root@octopus]:[~]# zfs set mountpoint=legacy rootpool/b75

You have a couple options for managing your boot environments at this point. You can either modify /rootpool/boot/grub/menu.lst yourself, or use Tim Foster's zfs-bootadm.sh to do it for you. The script relies on a property to determine which zfs fs are bootable, so you'll need to set that.


[root@octopus]:[~]# ./zfs-bootadm.sh
Usage: zfs-bootadm.sh [command]

where command is one of:
create
Creates a new bootable dataset as a clone
of the existing one.
activate
Sets a bootable dataset as the next
dataset to be booted from.
destroy
Destroys a bootable dataset. This must not
be the active dataset.
list
Lists the known bootable datasets.

[root@octopus]:[~]# zfs set bootable:=true rootpool/b75
[root@octopus]:[~]# ./zfs-bootadm.sh list
b74 (current)
b75
test
[root@octopus]:[~]# ./zfs-bootadm.sh activate b75
Currently booted from bootable dataset rootpool/b74
On next reboot, bootable dataset rootpool/b75 will be activated.
[root@octopus]:[~]# reboot

The box reboots, and...


[bda@moneta]:[~]$ ssh root@octopus
Last login: Fri Nov 16 02:50:21 2007 from 10.10.1.20
Sun Microsystems Inc. SunOS 5.11 snv_75 October 2007
bfu'ed from /tmp/archives-nightly-osol-nd/i386 on 2007-11-16
Sun Microsystems Inc. SunOS 5.11 snv_74 October 2007
[root@octopus]:[~]# uname -a
SunOS octopus 5.11 snv_75 i86pc i386 i86pc

Pretty dang cool stuff!

My initial test here was to BFU from b74 to b76. After some fumbling about with where the menu.lst file was (I knew it was stored on the rootpool from reading Lori Alt's weblog and various presentations, but rootpool was a legacy mount, so stupid tired me was confused for a good ten minutes). The BFU and acr itself appeared to be fine, and when I finally got the BE to boot, it... panicked.

I was somewhat discouraged, but booted right back into b74 and BFU'd happily to b75.

Which was the entire point of the exercise: To upgrade the system and have a safe way to fall back to a previous build if the system becomes unusable. As I said, it's the poor man's LiveUpgrade, but LU doesn't currently support zfsboot. And, really, this just seems much quicker and easier to deal with.

There are plenty of little things to figure out still (like which filesystems are required to be on the BE for the BFU to work, so I don't end up with data being snapshotted forever), how to deal with package upgrades, and the the like. But overall... very, very cool.

Another thing to note is that everything above was gleaned not just from documentation but from the blogs of the developers.

November 22, 2007
December 15, 2007

I have been a big fan of Patch Check Advanced, as it makes patching Solaris systems not an incredible pain in the ass.

Noted on the news section there is pcapatch, which evidently aims to safely automate pca patch installation.

I suspect I might be a big fan of that as well.

December 16, 2007

To be clear, my understanding of everything I'm about to say is very basic. It's all built on implementing work others did a few months ago, and reading up last night and this morning. If I say something ridiculous, I call nubs.

(As an aside, it appears that TCL supports USDT probes; news to me!)

Bryan Cantrill mailed me the other day after finding my previous post regarding DTrace and Perl via a post by Sven Dowideit. Bryan noted that Alan's patch pre-dated Adam Levanthal's work on is-enabled probes, which are highly useful for dynamic languages: Code is only executed when DTrace is actively tracing a given probe.

When it isn't, there should be no perf hit; the caveat seems to be that when tracing is enabled when using is-enabled probes, the hit is going to be higher than the previous standard static probes.

In the current state of DTrace in Perl (as far as I am aware), there are only two probes: sub-entry and sub-return. Compare to Joyent's work on Ruby, which has about a dozen (the diff for Ruby is over 20,000 lines, though, so obviously there's a lot more going on than just throwing some USDT probes in). When you are only interested in having what objects are being destroyed, for instance, you don't want to have the function probe toggled.

So this morning after reading a very helpful USDT example, I went ahead and modified Alan Burlinson's patch for is-enabled probes.


[20071216-10:56:50]:[bda@drove]:[~/dtrace/perl]$ diff -u perl-5.8.8-dt-alanb/cop.h perl-5.8.8-isenabled/cop.h
--- perl-5.8.8-dt-alanb/cop.h Sat Dec 15 17:15:14 2007
+++ perl-5.8.8-isenabled/cop.h Sun Dec 16 10:56:49 2007
@@ -126,6 +126,7 @@
* decremented by LEAVESUB, the other by LEAVE. */

#define PUSHSUB_BASE(cx) \
+ if (PERL_SUB_ENTRY_ENABLED()) \
PERL_SUB_ENTRY(GvENAME(CvGV(cv)), \
CopFILE((COP*)CvSTART(cv)), \
CopLINE((COP*)CvSTART(cv))); \
@@ -180,6 +181,7 @@

#define POPSUB(cx,sv) \
STMT_START { \
+ if (PERL_SUB_RETURN_ENABLED()) \
PERL_SUB_RETURN(GvENAME(CvGV((CV*)cx->blk_sub.cv)), \
CopFILE((COP*)CvSTART((CV*)cx->blk_sub.cv)), \
CopLINE((COP*)CvSTART((CV*)cx->blk_sub.cv))); \

Yeah, that was really it. I know, right?

So, now, what do my numbers look like for running the Perl test suite?

Note that all I'm doing is firing on sub-entry and sub-return with no other processing, in destructive mode (otherwise DTrace bottoms out due to systemic unresponsiveness).

static libperl, unpatched:

real 5m42.162s
user 2m28.597s
sys 0m30.161s

dynamic libperl, unpatched:

real 6m31.771s
user 3m16.823s
sys 0m31.698s

dynamic libperl, patched, standard probes, not instrumented:

real 6m33.610s
user 3m12.911s
sys 0m33.445s

dynamic libperl, patched, standard probes, instrumented:

real 9m1.302s
user 3m15.186s
sys 2m47.087s

dynamic libperl, patched, is-enabled probes, not instrumented:

real 6m44.823s
user 3m18.589s
sys 0m43.765s

dynamic libperl, patched, is-enabled probes, instrumented:

real 9m27.597s
user 3m16.791s
sys 3m6.972s

Not that big of a difference, really.

What's really interesting (to me, anyway) about the above are how dynamic libperl and both sets of patches take basically the same amount of time to complete. Compared to my previous "tests" took an extra ~40s as opposed to 10s. Here I am using Sun Studio 12; previously I had been using gcc. I imagine that might make a difference.

I suspect, though, that a number of further factors are at play: the fact that the Perl test suite's behavior is (hopefully?) nothing remotely akin to what you'd see in production, the fact that we're only instrumenting a single set of probes as opposed to having entry points in other places for comparison... Most importantly, though, I imagine whatever changes were made to Ruby might have analogies here as well.

Still, I'm interested enough now to start digging through Joyent's Ruby diff and investing Perl's internals to determine other probe points.

Maybe in a week or so I'll have something worth showing off to #p5p as Rik suggests.

Or my C ignorance will bite me horribly and I'll be forced to commit seppubukkake to save.. face?

January 3, 2008

Ben Rockwood expounds upon the joys of IPMI.

As someone who was only using it to reboot his systems (and configure the SP when I'd forgotten to do so during build), it's a pretty enlightening article.

January 20, 2008

Transactional Debian Upgrades with ZFS on Nexenta

Bloody amazing is what that is. Not because the concept is revolutionary (it's been possible with hacked ONNV installs for a while now, Indiana is doing something similar, and a few "other" operating systems have had similar capabilities), but because it's integrated and the interface itself is so obvious. It looks as easy to use as apt(8) and zfs(1M).

Very exciting stuff.

April 26, 2008

Ben Rockwood digs into some odd disk activity using your friendly neighborhood Solaris tools.

June 2, 2008
<