-- William Gibson, All Tomorrow's Parties
Note to self: Do not upgrade a Perl distribution by two full dot-releases without also recompiling mod_perl. This can end up causing a minor, ten minute long headache after the machine is rebooted for a kernel upgrade and the httpd is finally flushed out of memory and attempts to restart.
And proceeds to segfault.
Took me a few minutes to remember "EVERYTHING=1", as well, which makes the world just full of bunnies. Blue ones. With wings on.
Work has been very interesting this week.
I was informed, in an extremely off-hand manner, that it was a definite that we would be moving operations of our primary building to another -- probably over to the photography studio building.
That building is mostly factory floor space, filled by a press and a book bindery. The office area is taken up by the photo studio, and several smaller office-worthy (read: carpeted, not near heavy equipment) areas. We have several large printers we'd have to move over, not to mention all the people: Pre-press, CSRs, development.
We aren't even sure where we'd put the servers at this point, but we have a few ideas. Our current building is incredibly nice, and the CTO who was responsible for the goodness there is in charge of this move as well, so I'm not too worried about the final layout being not-usable. Hopefully we can get development cordoned off somewhere, as our other co-workers can be very loud (and engage in tapeball warfare at random times).
While I think that consolidating the business into one building is a good move, this isn't going to be a lot of fun.
If nothing else, it forces me to do a number of things that I've been kind of spinning my wheels on, like moving our MX and DNS machines to our colocation facility.
I also ran a quick inventory today, and it came out to about 49U of rackable gear, with about a dozen mini-to-mid-tower server boxes. That doesn't include the tape robot, or the tape archives themselves.
I guess it's a good thing I refactored the LAN last year. This would have been an enormous pain in the butt with the old setup, instead of just a pretty big one. :-)
Just got done with my brief pseudo-meeting with the COO, explaining that while we could in theory sell Archivist as a stand-alone product, he understood how unlikely that was (and if it ever become sellable, we could just offer support and customizations for it instead).
His comment was the same as it's been for the last three years when I ask him these questions (for stuff that isn't really worth releasing, unfortunately): "If we're gaining from open source, or Free, software, I think we, morally, should give something back."
Which just plain rules.
Today has been incredibly awful so far, and shows no sign of letting up. I got into work at 0715 and had run completely out of patience by 0900.
- The primary fileserver is full. The secondary fileserver started doing this awesome thing where if you tried to get a shared volume list on an OS9 machine, the client would lock. OS X, Finder would freeze. So I poke screw around with netatalk, and determine it's only this one share that's causing the problems. The production share. Of course. Quickly ascertain that there's some filesystem corruption going on and that I'm going to have to rebuild the journal trees. Doing this with reiserfs has always freaked me out so I'm copying all the data (170G) to the tertiary fileserver, which is actually a backup staging box. It has 600G on it, however, so it's doable. So I install netalk on it, and recompile the kernel... and wait. And wait. For this data to copy. This is awesome.
- About a half hour before I started dealing with that, the billing server blew its root drive. This is a 15 year old machine running UnixWare 3.0. Great.
- We're moving the rest of the accounting system and apparatus today. This primarily consists of a ten year old DOS/NetWare box and a Linux backup machine which mirrors it. The problem is that accounting is on its own physical network, which is where it should be. But as the new building doesn't have multiple networks in it (ha), I have to run several hundred feet of cable through the ceiling, which is a good twelve feet high. Lovely.
There is more to complain about, but Adam is insisting we go to lunch now.
Two years ago: "We should replace this. The machine is non-portable, the application does a lot of weird voodoo and we have no good options if it crashes and burns."
One year ago: "Well, it seems to run okay on a local box thanks to compatibilty stuff in Windows, but this is still not good."
Friday: "I hate all of you."
Things that broke today:
- Several workstations randomly stopped seeing the AppleTalk network.
- I fixed this by going into Network Browser, then back into Chooser. Unfortunately, one of the machines continued to just lock up when a server was accessed. Then it stopped. Just randomly. I've jokingly said that Macs (OS9) are things that sometimes Stop Working, and will then Start Working again without reason. Now I'm not joking. Fuck Mac OS <X.
- When the NetServer a month ago, it was because the RAID5 array had been running in a degraded (one disk dead) state for perhaps a year. Then a second disk died. So we sent it in to get the array recovered. It came back, no problem, except that all the metadata is now gone, probably hidden somewhere on the root drive on the NetServer itself (I assume Windows would keep that information there, and as there are no dotfiles on the recovered disk, I guess I'm born out there), but I can't get at it because I don't have the fucking keys to the machine, and obviously it won't boot without a keyboard when the array is fucked.
- I can edit the files manually in resedit, solios informs me, but there are over a hundred thousand files. I guess I get to learn how to script on OS 9 . Though, I can't think of a reason I couldn't do this on OS X... Hm... Monday.
I could go on. And on. And on. I was so full of rage by the time I left the office, I decided it would be a good idea to walk the 20 blocks from the train station to my apartment in an attempt to wear myself out. Unfortunately, I decided to walk down South St., while listening to NIN's "All That Could Have Been" way too loudly.
So now I'm tired, full of rage, and my ears are ringing.
I'll be spending the better portion of the day at a client site, theoretically doing a network and systems audit.
This should be entertaining.
I have an Xserve waiting in my office today.
And a lot of broken stuff that threatens to drown me in its brokeness.
The weights don't tip that far.
Decided to go out to the colo tonight and fix what was the primary database server. By "fix" I mean reboot the damn thing and recompile the kernel to use the old Adaptec 7k driver.
What a fucking waste of time. I didn't remember the new code, didn't have my laptop, they changed the wlan setup and I wasn't on the ACL, and my cell phone is dead.
I also decided, while listening to Philly "party" radio, that I want some club music.
Yesterday? When I had that super migraine? And I threw up and wanted to bleed out my eyes? I think something broke.
We finally got a "real" box for our backup mirror and transfer point. P4 2.6G, 512MB RAM, dual 1000BT, enough room in the case for seven drives. Of course, the power supply only has connectors for four, but that's okay. Two PCIX slots as well, so I can feed the 64bit RAID controller we have in it once we get some bigger drives (something else I think I'm going to insist on).
It took about twenty minutes to put together, and in another five, I'll have OpenBSD 3.6 installed on it. Probably ten minutes after that, I'll have a sync running off the production server.
OpenBSD is teh lurve for just straight up getting shit done.
I'm pulling a night shift today. Coming up on the end of it now. As soon as Adamk gets into the office, I'm outta here. Gratingly, I only got a third of what I wanted to accomplish done...
1) Install firewall at colo.
2) Get a current image of the production data at the colo, so we can start syncing it again.
3) Swap DHCP/DNS servers on company LAN.
Only #1 got finished. I had to wait two hours at the colo, spinning my wheels, waiting for their freaking arp tables to update. Two hours! To update the arp cache!
#2 is still running right now. Hopefully OpenBSD's ccd are portable between machines. If they hide device info on the system, at least I didn't really waste much time. If I have to drag another machine down there, I will. Four hours later, it's at 210G of 280... sigh.
I didn't get back soon enough to do #3. I wanted to be back here by 0300. I rolled back into the parking lot at 0500. Awesome.
There was some grungy looking guy standing outside the back of the colo building, too, smoking pot. The colo is out in the middle of nowhere, at the ass-end of this business park behind a Wal-Mart.
As I was dragging the firewall into the building, the guy yelled at me: "Yo, I jus' smokin' a cigarette!"
"Just a cigarette!"
I hate New Jersey. Two more days!
Bryan Allen: This is hard to get used to.
Bryan Allen: Stuff like a.b.c.1 not being a network device/system.
Bryan Allen: But just a user.
Andrew Brennan: ha!
Andrew Brennan: you'll have to drink CIDR the next time we're out ... it helps.
Bryan Allen: BOOOO
I spent half the day playing with Log::Dispatch and the other half fighting with CGI::Application, which I don't recall requiring any work to make work at all when I started writing Archivist last year.
Log::Dispatch is super cool, though. I think I'll be playing with substianating several objects for different log levels so I can log to different places (since I'm probably just going to be logging to files for the time being). L::D::File::Timestamped is kind of confusing, though. I can't think of many situations where you'd want to write to $file-$timestamp, which is really not what I thought it would do. :)
Tomorrow is finishing CGI::App, Sessions, and starting up with Net::SNMP!
I got my MacMini today. It's pretty super. I wasn't expecting it until Monday, so checking the FedEx site this morning, and seeing it was on the truck for delivery was a pretty nice way to start the day. I had time to set it up and burn the latest Tiger build before heading out. "Nice," I thought, happy with how the day was going.
Then I got to work and it all went to shit.
Machines falling over, all sorts of lameness being talked about.
The box I was using for dev, wiki and issue tracking decided that one of its disks just didn't feel like working any longer, so it ate it. Many bad blocks, and then fsck decided that /usr/lib and /usr/libexec were not necessary directories for UNIX to operate.
< bda> So they're letting us out early, and Paul (one of our managers) is going around letting us know.
< bda> Jeff: "Is it snowing or something?"
< bda> Paul: "er... no. It's Easter weekend."
< bda> bda: "So wait, only the good, god-fearing Christians get to go home early?"
< bda> Paul: "Exactly."
< bda> bda: "You bastards. Not only do I not get to go home, I have to burn in Hell, too?"
< bda> Paul: "Yup."
< bda> :(
And now instead of going home and playing WoW on my new mini I'm going to sit here and make this machine work again, because it made me angry.
Spent the day playing wack-a-mole with a worm. It was mildly entertaining for the first hour... then it really started wearing thin. And it's not over yet. The initial box looked to be an owned Linux machine, probably popped via an unpatched cPanel install. A second infecting machine is still running, however, and tomorrow will be lots of fun dismantling the botnet the skiddies put together today.
Yay for infosec.
There was one seriously braindead moment for me late into the afternoon where I was staring at tcpdump output, trying to figure out why one of the comp'd hosts was synflooding the C&C. It wasn't until I talked to Harry that it became obvious that the C&C had been taken down and the worm was astoundingly poorly written. It would spam a SYN, get a RST, then loop, immediately. Awful.
It did bring up an interesting point, though... all the source ports were in the 2100-2200 range. Made me wonder if that was something the author had specified or if that was just how Windows manages source port allocation. I don't know anything about it, really.
Tomorrow will likely be more of the same, though rather than doing it all manually I will definitely be automating a good portion of the process.
It could only have been more tedious if I had to run every freakin' SNMP query to kill the ports by hand...
The source repos are up to date in those browsers, so feel free. Once the code is done, there will be a public svn server, and hopefully some kind souls will be interested enough in it to ask for commit access and hack on it with me.
I am slow freakin' coder, and not a very good one, I think. But it all seems to work; and if not very efficiently, well, that's what refactoring is for...
It seems like I'm getting one section done a day in the manager. Yesterday I finished the majority of the base perms management functionality, and today I tore through the roles stuff. There are one or two basic bugs plaguing roles, but otherwise it works.
Once users and groups are finished, I can add all the actual access checks to them, which should be... entertaining.
It's definitely been a learning experience.
The annoying thing is that once I'm done, I have one small project that will use it, and then one giant freakin' octopus of a project that will as well.
(Alternatively: "You hack on that thing? You're braver than I thought.")
Know what I hate more than writing code?
Fixing my own broken-ass code.
Know what I hate more than fixing my broken-ass code?
Fixing someone else's.
I like when you pull a tab-delimited file from a vendor, and when you parse it, a good majority of the fields don't match up between entries. That fills me with confidence and makes me want to continue paying for their services.
I also really like when people use non-portable functions, because obviously your software is always going to run on Linux.
And my most favoritest thing: When people glob
rm, using relative paths from
$PWD. Good jobs, guys, I didn't want to keep
../ around, whatever the hell
../ is anyway!
Tuesday and yesterday I spent an inordinate amount of time making slides for an "intro to version control" talk I'm giving to the systems group on Friday.
I'm not sure how good it is; it's certainly very long. Adam made the comment that his dissertation was only 70 slides, for four years of work. Mine tops out around 160 and essentially cover the first four chapers of The Subversion Book, and is targeted at people who only have a very vague idea of what change management/version control is. Adam suggested that 160 slides means they should just read the fucking book themselves, and I find it hard to disagree.
Sunflare on SILCNet gave me an incredible amount of help in terms of copy editing (70 plus fixes!) and asked if he could use it (he also works as a .edu), so I suppose it means it can't be too bad.
Anywhere, here is the PDF. It's an export from Keynote, so it's freakin' huge (every page is a tiff, or something silly). The PPT export was 44MB. And the SWF was 180MB. That was funny...
As always, comments welcome. Hopefully the talk will go okay. :-)
Today was a minor personal milestone at work: Our Solaris systems (the very few there are at the moment, comparatively) are no longer second-class systems. We have an organic software management framework which has had the hands of several admins and programmers on it over the last five or six years (or longer, maybe, in some pieces of code).
After a fair amount of tedious toiling, I got the scripts mostly Solaris-friendly (no more
hostname -s) and they can now have code pushed to them. This makes them immensely more useful, as you might imagine. I have no doubt that there's a lot of little fixes needing to be done, but the fact that fixes can be pushed trivially makes it massively less painful to do.
The next step is to figure out a good method for deployment of the PAR distributions we've been building for CPAN modules we use. Once that's done, we can start packaging up our own modules and distribute them the same way.
And hopefully early next week, we can start collapsing production services into Solaris zones.
Huzzah to that.
Speaking of zones, once I clean up my add-a-zone script (imaginatively named
newzone.sh I will probably publish it. It seems to be the trendy thing to do.
After talking about it for at least half a year, last night I finally started really reading up on Puppet. After watching a BayLISA talk by Luke Kanies (p1, p2), I installed it one of my Solaris 10 test boxes, installed a couple test zones, and started screwing around with it.
I gotta say, it's super easy to get it up and running. The hardest part, conceptually, is going to be modeling the environment in such a way that won't require repeated major refactoring every other week. Minor tweaking, sure, but ripping walls down would get old quick. Thankfully there are documents like Puppet Best Practices to get you going. There's also a fair amount of code under the hood already, and determining how much of it will be usable to me is going to be fun. The zones management type looks really, really useful considering how heavily we currently use zones, and how that isn't going to do anything but increase.
This week I really hope to have all my system tests (written in Test::More) ported over to Puppet in some relatively sane manner.
Configuration management systems are one of those things that simply make your life less hateful.
The way the last few weeks have gone, I'm going to have to start focusing more heavily on automating everything that can be automated, or spiral further into frustrated insanity.
< confound> https://trac/wiki/DrinkOrders
< bda> There's no gin on there.
< confound> reload
Things we will not order:
< bda> LAME
< confound> HTH
< bda> TTTH
< confound> what
< bda> Talk To The Hand.
< bda> Beyotch.
< confound> sorry, I didn't realize we were time-travelling to 1990
< bda> I am wearing my TARDIS boxers today.
< confound> inside they're the size of a warehouse
Over the last two weeks we (read: rjbs) migrated our Subversion repositories to git on GitHub. I was not very pleased with this for the first week or so. By default, I am grumpy when things that (to me) are working just fine are changed, especially at an (even minor) inconvenience to me. That is just the grumpy, beardy sysadmin in me.
After a bit more talking to by rjbs, things are again smooth sailing. I can do the small amount of VCS work I need to do, and more imporantly: I am assured things I don't care about will make the developers lives much, much less painful, which is something I am certainly all for.
git is much faster than Subversion ever was, and I can see some features as being useful to me eventually. Overall, though, what I use VCS for is pretty uninteresting, so I don't have much else to say about it.
I had a couple basic mental blocks that rjbs was able to explain away in a 20 minute talk he gave during our bi-weekly iteration meeting. It was quite productive. There are pictures.
Work has otherwise consisted of a lot of consolidation. I have finally reduced the number of horrible systems to two. Yes. Two. Both of which are slated for destruction in the next iteration. Not only that, I have found some poor sucker (hi, Cronin!) to take them all off our hands. Of course, they'll be upgrading from PIIIs, so...
I also cleaned up our racks. A lot. They are almost clean enough to post pictures of, though I'll wait until I've used up more of the six rolls of velcro Matt ordered before doing that.
Pretty soon we'll have nothing but Sun, a bit of IBM, and a very small number of SuperMicros. My plans are to move our mail storage from the existing SCSI arrays to a Sun J4200 (hopefully arriving this coming week). 6TB raw disk, and it eats 3.5" SATA disks, which are ridiculously cheap these days. I really, really wanted an Amber Roads (aka OpenStorage) J7110, but at 2TB with the cost of 2.5" SAS, it was impossible to justify. If they sold a SATA version at the low-end... there has been some noise about conversion kits for Thumpers, but that's also way outside our price range.
I doubt conversion support will become more common, but if I could turn one of our X4100s and the J4200 into an OpenStorage setup, I would incredibly happy. If you haven't tried out the OpenStorage Simulator, I suggest you do so. Analytics is absolutely amazing.
People on zfs-discuss@ and #opensolaris have been talking about possible GSoC projects. I suggested a zpool/filesystem "interactive" attribute, or "ask before destroy." However you want to think of it. Someone else expanded on that, suggesting that -t be allowed to ensure that only specified resource types can be destroyed. I have yet to bone myself with a `zfs destroy` or `zpool destroy` but the day will come, and I will cry.
I see a pkgsrc upgrade in my near future. I've been working on linking all our Perl modules against it, and I want to get the rest of our internal code linking against it as well. It will make OS upgrades so, so much easier. Right now, most code is either linked to OS libraries or to an internal tree (most of which also links to OS libraries).
We've almost gotten rid of all our Debian 3.1 installs, which is... well. You know. Debian 5.0 just came out, and we've barely gotten moved to 4.0 yet. Getting the upgrade path there sorted out will thankfully just be tedious, and require nothing clever.
I really hope that the Cobbler guys get Debian partitioning down soon, and integrate some Solaris support. I tried redeploying FAI over Christmas and man, did it so not work out of the box. I used to use FAI, and was quite happy with it. I had to hack it up, but... it worked pretty well. Until it stopped.
If Cobbler had Solaris support, I would seriously consider moving our remaining Linux installs to CentOS. We use puppet already, so in many ways Cobbler is a no-brainer. We are not really tied to any particular Linux distribution, and having all our infrastructure under a single management tools ken would be really nice. To put it mildly.
30% curious about OpenSolaris's Automated Installer project, but it's so far off the radar as to be a ghost.
I picked up John Allspaw's The Art of Capacity Planning, and it's next on my book queue. Flipping through it makes me think it's going to be as useful as Theo S.'s Scalable Internet Architectures.