Today sucked

So, let's see. What happened today.

First, I started trying to hack support for continuing to do examine and enable/disable through the K4 kadmind that will proxy password changes to K5. We need to have all the password changes go to K5 (which will then propagate them back to K4), but the propagation code doesn't handle enable/disable (I believe). We also have interfaces that depend on examine working properly, and while I can just have that interface print out fake K4 information (or parse kas output), it would be nice to have that work. That was some incredibly annoying bit of coding, though, as XDR started fighting between AFS and K5 and I had to suck in additional bits of our old broken kadmind.

Then, once I thought I had that working, I tried to build it on Solaris (which is where we'll have to run it since our VLDB servers are currently Solaris), which required a bunch more annoying and fragile changes.

In the meantime, I discover that tomorrow I have to go to a meeting in which we get to discuss whether we're going to shoot ourselves in the foot by deciding we're no longer officially supporting a major part of our infrastructure because it's too hard, where things that I said were taken out of context and used to justify this position.

Then, the kadmind doesn't run on Solaris. Or rather it starts, but then each time one connects to it to try to do something, that fails and the forked child process crashes. I start tracking that down to it trying to open files that it really has no business wanting to open, and then discover that I can't easily restart it for testing. Despite the fact that it's trying to set SO_REUSEADDR, it's not working, and each time it crashes it leaves sockets in TIME_WAIT.

Frustrated by having to wait five minutes to try a new build, I decide to just reboot the system, since that will take less than five minutes to come back up. Except that, upon reboot, the system decides that it can no longer read its kernel off disk.

After a half-hour of fighting with that, I try to boot it off the network so that I can at least mount the drive and see if I can repair things, given that this is the test environment that I've spent the last month setting up and some of the setup pieces are annoying and took me a while to get right. But it can't find a RARP server. So I try to go add it to our Jumpstart server, which is also a RARP server, discover that i can't remember the right syntax, fight back and forth with that for a while (it takes a minute each time to add or remove the client because Jumpstart is just that slow), and then discover it still doesn't work.

Oh, and in the meantime, the new OpenAFS release candidate immediately segfaults on AMD64 2.6.18 kernels because the kernel folks have moved things around in yet another new and exciting way.

Giving up on the Solaris system for right now, I decide to just reconstruct my test cell on Linux, where it's much faster, at which point I discover that our local kaserver build doesn't support -noauth so I can't bootstrap a new K4 realm. Build the latest version, still no luck. Finally figure out that it's looking for the NoAuth file in a stupid place, get that bootstrapped, get the environment set up for running kadmind, build the kadmind proxy on Linux, and it segfaults. Further analysis with a debugger reveals that it's corrupting its own K5 context while calling one of the internal functions that you're not supposed to use but that this code has to. Wondering if I broke something, I go back to the original code before I hacked the K4 stuff into it. Same thing. Note that this was all working on Solaris and Solaris had other issues.

I've given up and mailed the developer for help. Oh, and Debian is now starting an Apache 2.2 transition, but I can't build new WebAuth packages for 2.2 because apache2.2-common is uninstallable. Bug filed, which has yet to show up in the Debian BTS for reasons that escape me.

This is really one of the worst days that I can remember having in quite a while. I will be going to bed tonight with more work left to do than I had when I started this morning. Oh, did I mention that I have no line manager and other services that my group runs are currently having serious problems that are tapping out the other people in my group? And tomorrow I get to go talk about whether we should blow a hole in the other foot. Yay.

Posted: 2006-10-02 21:01 — Why no comments?

Last spun 2013-07-01 from thread modified 2013-01-04