Eagle's Path: January 2013

2013-01-01: 2012 Book Reading in Review

For the year of 2012, I finished and reviewed 60 books, the same number as 2011. Given how stressful and chaotic much of the year was, I count this a major triumph. The hardest part was not the reading but the review writing; the end of the year required a significant push to finish writing reviews of all the books I'd read that year, and at one point I was reviewing things I'd read more than two months earlier. But I can enter 2013 entirely caught-up.

This continues to feel like about the right pace, striking a balance between reading enough that I can pursue multiple reading goals at the same time, while leaving enough time for other projects and video games.

I gave three novels a 10 out of 10 this year, but the stand-out even among that group was Elizabeth Wein's Code Name Verity. Not only was it my favorite book of the year, but it was one of the best books I've ever read. The other two 10-rated books are also highly recommended, of course: Matt Ruff's fascinating novel of multiple personalities, Set This House in Order; and C.J. Cherryh's SF classic, Cyteen. The latter was a re-read in advance of reading the sequel, Regenesis, which I also recommend if you've read and liked the last third of Cyteen.

Other fiction highlights of the year were China Miéville's Embassytown and Suzanne Collins's The Hunger Games triology, particularly the third book, Mockingjay. Embassytown continues the trend from Miéville's The City & The City of tighter, faster-moving novels while moving into space opera and first contact territory. It keeps Miéville in the top rank of current SFF writers, despite having some suspension of disbelief problems.

Collins's world-building in The Hunger Games also has suspension of disbelief problems, but I understood why the series was so popular. I was caught by surprise by its look at violence and its after-effects and was particularly impressed by Mockingjay and Collins's chosen ending. I think this is a popular series that lives up to the hype.

There were no 10 ratings in non-fiction this year, but several books nonetheless stood out. John Kenneth Galbraith's The Affluent Society is a thought-provoking book that questions the foundation of a consumer-oriented economy while perceptively pointing out how it distorts choices away from public goods. Susan Cain's Quiet is a passionate defense of introversion in an extroverted world. And, finally, Joshua Bloch's Effective Java and Damian Conway's Perl Best Practices are both insightful looks at the good and bad of their respective languages and taught me a great deal as a practicing programmer.

One final highlight to mention: Eclipse Phase by Posthuman Studios is an excellent RPG sourcebook, at least from the perspective of interesting world-building. I haven't played it, but I thoroughly enjoyed reading it (and have acquired nearly all of the supplements).

The full analysis includes some additional personal reading statistics, probably only of interest to me.

2013-01-02: podlators 2.5.0

podlators is the package that contains Pod::Man and Pod::Text and the pod2man and pod2text driver scripts, which convert Perl's POD documentation format into text and *roff (for man pages) output.

This release is a "clearing the decks" release that rolls up various smallish feature requests and bug fixes that have been pending for a while. I want to do the same reworking to podlators that I did to Term::ANSIColor, but it's going to take longer since it's quite a bit more code. (And Term::ANSIColor took me about two days.) I didn't want to block the various pending issues on that work, so this tries to clean up all the open bugs I was aware of other than sorting out the defaults for Unicode output. (That's a much larger problem and will probably be fixed in a major release.)

The primary change is that pod2man and pod2text now die by default if the POD contains syntax errors. This will hopefully flush out numerous POD documents around the world that have been carrying POD ERRORS sections around forever. I considered just warning by default but not failing, since this will also cause CPAN builds to fail, and I may fall back to that if there are too many complaints. But usually fixing the POD syntax is straightforward, and normally parsing programs will fail on syntax errors.

Note that the defaults for the modules, Pod::Man and Pod::Text, continue to be to add a POD ERRORS section, since aborting inside a module isn't very friendly to the caller.

There is a new option for both the modules and the driver scripts to set the error handling behavior, which can be set to die, report to standard error but continue, add a POD ERRORS section, or ignore errors entirely.

Also in this release is a new nourls option that suppresses the URLs in text and *roff output for L<> codes that have anchor text, which can declutter some POD texts, and fixes for X<> formatting codes containing whitespace and small-caps handling of paragraphs consisting entirely of all-caps text.

You can get the latest release from the podlators distribution page.

2013-01-03: spin 1.79

spin is the program that I use to generate all of my web pages, including this journal, from a bespoke macro language. I've been slowly evolving it since 1999 and really do intend to turn it into a proper Perl module one of these days....

My web site was one of the few places left that I was still using Subversion, and I finally decided to do something about that during this vacation. The main reason why it wasn't trivial was that I was using Subversion Id strings to get a more accurate last change date for files rather than just using file timestamps. (Nit-picky, I know, but if I'm not allowed to be nit-picky on my own web site, where can I be?) This release recognizes when the source directory is a Git working tree and supports determining the last modification date from git log instead (by way of the Git::Repository Perl module).

I also updated all my various (not released) helper scripts for maintaining the site to use Git rather than Subversion, which mostly involved changing svn to git and removing some fiddly bits that Git doesn't require.

Of course, this means that I also removed the \id thread commands on all my pages, which in turn means that I just updated the last modification date of every file, making it somewhat hard to test since every thread-generated page on my site has now been modified today. *heh*

You can get the latest release of spin from my web tools distribution page.

I also released version 1.44 of my release utility, which I use to help automate some of the work of generating software releases. The change there is to use Git rather than Subversion to commit changes to the .versions file that drives some of the software-related parts of my web site. You can get the latest version of release from my scripts page.

2013-01-03: cl2xhtml 1.10

cl2xhtml converts GNU ChangeLog files to XHTML for (hopefully) pretty web display. After making the most recent Term::ANSIColor release, I noticed that it didn't support the multi-author ChangeLog convention. This is not in the GNU standards, but it's become conventional to note changes by multiple authors with the syntax:

2013-01-03  Some Author <author@example.com>
            Another Person <person@example.com>

        * file.c: Some change.

cl2xhtml now understands that syntax, including with more than two authors, and turns it into a list of authors on the HTML page separated with commas and "and" as needed.

You can get the latest version of cl2xhtml from my web tools page.

2013-01-04: Automated Perl testing

I started several coding projects today, but none of them reached an announcable conclusion, so this seems like a good day to write up my current discoveries in automated Perl testing modules.

This post isn't about test frameworks like Test::More and friends, or even about helper modules (although I do recommend Test::Warn if you want to test warning code in your modules). It's not about writing functionality tests. Rather, it's about various CPAN modules that can be used almost identically in just about every Perl module distribution (and in some other places, like Perl scripts inside a larger package written in another language). You can drop a small driver script into your t directory and (mostly) run make test with no further customization.

If you want to see the driver scripts for all of the below that I'm currently using, see the t directory of the Term::ANSIColor distribution. They'll slowly make their way into my other modules, and probably, despite the name of the package, rra-c-util, since that's my equivalent of gnulib and I have tools to synchronize shared code from that package. The driver script generally tests whether the module is installed, skips the test if not, and otherwise dispatches to the "test everything" routine in the module.

Test::MinimumVersion: This tests that you're not accidentally using a newer Perl feature than you were expecting. It's very handy if you want to support backward compatibility to a particular version of Perl. It knows about features as far back as 5.004_05. It won't be able to catch everything, but it will catch some of the most obvious things.

Test::Perl::Critic: This is sort of the thousand-pound gorilla. Test::Perl::Critic runs all of your code through perlcritic and reports any errors. You will probably need to combine this with a perlcriticrc file in your distribution that you configure it to use, since otherwise the person running the tests may have customized perlcritic. If you let it run perltidy to check for formatting problems, you'll also want a perltidyrc. See the driver script mentioned above for how I set that up. I usually configure this test to only run for me (by checking an environment variable) since the results vary a lot depending on what version of perlcritic is installed, and it's likely new versions will cause the test to start failing. If you use this, expect to need to add ## no critic annotations to your code.

Test::Pod: Checks for syntax errors in your POD documentation so that you won't have those embarassing POD ERRORS sections. Note that the latest version of podlators turns syntax errors into fatal errors, so soon you won't have to worry as much about this. But if you'd been using this module all along, you wouldn't have to worry about the fact that pod2man may start faliing due to syntax errors you never fixed!

Test::Pod::Coverage: Only of interest to module distributions, not ramdom scripts, this checks that every sub provided by your module has POD documentation. It understands various subs that don't need to be documented (such as the ones you provide to implement a tied class) and lets you mark some subs as not requiring documentation.

Test::Pod::Spelling: Checks your POD documentation for spelling mistakes with a spell checker. (Note that perlcritic, and hence Test::Perl::Critic, can also do this, but I use this module separately since I think the output is more readable.) You will almost certainly want to make this test maintainer-only, since spell checkers and spelling dictionaries vary a lot. But it prevents a lot of embarassment when you read your documentation on search.cpan.org and find that obvious spelling error you never saw before.

Test::Strict: Checks all Perl code to ensure that it compiles, checks whether it has use strict, and optionally checks whether it has use warnings. This is great; I use this even with packages that just have a few Perl utility scripts in a package otherwise written in C, since I previously kept releasing packages with helper scripts with obvious syntax errors since I didn't run them much. Another place it's useful is if you have internal Debian packages that wrap up Perl utility scripts; just create a small test suite with this module and run it from override_dh_auto_check and you won't ever release a package with scripts that won't run. You can also do test suite coverage testing with this module (and Devel::Cover), but that you'll want to mark maintainer-only since it takes a while.

Test::Synopsis: Checks the code in the SYNOPSIS section of any POD documentation to ensure that it compiles. No more syntax errors in your example code in the documentation!

All of these modules are, of course, packaged for Debian (thanks largely to the wonderful Perl packaging team!), and otherwise available on CPAN. There are a variety of others, of course; these are just the main ones that I've found personally useful.

One final whine, however: all this wonderful Perl testing infrastructure desperately needs a common module and a common interface for providing a list of files to check. Every one of these modules reinvents that wheel, and most of them do it a slightly different way than all the others. Most of them check in either blib or lib (or sometimes both). Most of them also add t. Most of them don't include Makefile.PL, but some of them do. Almost none of them look in examples. Some have a simple way to add more files or directories; some don't. Some can be run from the t directory and still find things, but most of them can't. And of course the syntax is different in every case! Most of the work in my driver scripts is finding the right way to add a few additional directories to the scope of the testing, and in one annoying case (Test::Synopsis), I actually had to use Perl::Critic::Utils (for lack of a better module) to build a list of files to check instead of using the build-in file list, since the build-in file list required all module code be in lib.

A module to abstract out all of this and provide a common interface would be easy to write. Most of the work would involve talking to all the authors of all these different modules and asking them to use it. If someone in the Perl community felt like doing that, it would make life so much easier for everyone.

2013-01-05: Pod::Thread 1.00 (and spin 1.80)

Well, this wasn't really what I intended to do with the tail end of my vacation, but I got started on it last night after fixing a different bug and failed to stop.

Thread is the macro language (converted to HTML by a program called spin) that I use for maintaining all of my web pages, as mentioned in the previous release announcements from this week. Pod::Thread converts POD documentation into thread so that spin can convert it to HTML. This has multiple advantages over converting it directly to HTML for my web site, such as being able to use spin's methods for adding navigation links and not having to come up with separate style sheets.

This release finally converts Pod::Thread to use Pod::Simple as the POD parser, which means I no longer have any modules using Pod::Parser. I find Pod::Simple a bit more fiddly to work with, but it's actively maintained and higher-quality. This conversion, for example, fixes a problem with being too aggressive about turning =item into a numeric list, something that I really couldn't fix with Pod::Parser.

Also fixed in this release is the bug that got me started down this path: mishandling of URLs with anchor text.

This is the second Perl module that I've converted to my new coding style and augmented test suite, and I'm getting a little bit faster at it, but it still took me a full day. (Although that included the Pod::Simple rewrite, which required touching most of the code.) I'm not entirely sure that should have been my top priority today, but oh well, I had to do it sometime.

You can get the latest version from my web tools distribution page. I also uploaded a new Debian package to my personal repository.

I also released spin 1.80, a follow-on from the previous release, that fixes a bug in the new code to determine last modification date from Git.

2013-01-06: faq2html 1.31 and VCS migrations

faq2html is my version of that script that so many different people have written: a tool to convert text files into HTML for the web. This script is specifically tuned to the way that I format text files and has a lot of special logic to deal with my eccentricities, so I have no idea how well it would work for anyone else.

This release adds a new -l option, which says to add a subheading to the HTML output giving the last modified date based on the file modification timestamp. Previously, these subheadings were only added in some situations and based on RCS/CVS-style Id keywords in the file. Since I'm converting my personal repositories to Git, that was going to stop working.

You can get the current version of faq2html from my web tools distribution page.

With this update, I was able to convert my FAQs repository from Subversion to Git, which means that I only have one personal repository left in Subversion (PGP::Sign, which I will probably convert to Git shortly).

That leaves a few CVS repositories, and therein lies a problem.

There are a few places where I'm still using CVS for a conventional package or for some random files, and those can move to Git without much trouble whenever I get around to it. But most of my remaining use of CVS is for storing individual scripts that, if I distribute them at all, I distribute individually. And, for that, CVS seems to have a lot of advantages that are hard to implement with Git:

CVS maintains per-file revision numbers that I can manipulate if I want during check-in. This means that the revision control system maintains a reasonable release version scheme for me without having to do any additional work. I use the CVS-generated version numbers, with the occasional manual bump to a new major revision, for all of my scripts.
CVS Id keywords expose both the version number and last modified date to the file itself. Nearly all of my scripts have a bit of boilerplate Perl code that uses this to generate -v output showing the version number and last modified date. This is a nice feature; when you have a running script, you can always figure out its vintage. Maintaining this information in the script by hand is a pain.
CVS revision history maps directly to version numbers. This lets me generate, automatically, nice changelog pages like this one for my backport script. Without that concept of per-file version numbers for a script repository, I would have to determine the mapping of changes to version numbers manually somehow.

I'm not at all sure what I'm going to do about this. Some of my scripts currently maintained in CVS, such as all the AFS tools, should move into larger packages that are more conventionally distributed, with a separate README file and similar machinery, and I'm slowly in the process of doing that. But many of my scripts are small programs that don't fit into any larger whole and don't warrant that sort of treatment.

Honestly, I'm moderately tempted to just keep using CVS for scripts, since everything newer goes to repository-wide instead of per-file version numbers (for exceptionally good reasons when working on a unified piece of software, rather than a conglomeration of individual scripts), and I'm unenthused about writing my own machinery using Git hooks and smart Git log and diff parsing. But the CVS command-line interface is awful and the CVS repository format is even worse.

If there are any brilliant ideas that I'm missing, I'd be interested in hearing about them in email (or other blog posts on Planet Debian or somewhere else I read). If I come up with some solution or get any great advice, I'll post a follow-up.

2013-01-07: Term::ANSIColor 4.02

The previous release of my module to handle terminal escape sequences for text attributes and color worked fine on every version of Perl except for 5.6.2. This is the sort of thing that bothers me, since I'm trying to support every version of Perl from 5.6 on, but I don't have that old of a version around to test with any more.

Thankfully, David Cantrell (who ran a CPAN tester on that version) was kind enough to investigate further and found that the Exporter in that version requires any tags be listed before functions. After that tweak to one of the test cases, it appears to work properly (so the module itself was fine).

This release also adds the minimum Perl version to the module metadata.

You can get the latest version from the Term::ANSIColor distribution page.

2013-01-08: postfaq 1.17

I promise that the software releases will decrease in velocity soon! Work is starting up again and will involve longer projects, so I won't be able to keep releasing something daily. I'm still finishing up some fallout from my personal revision control migrations.

postfaq is my script for posting Usenet FAQs, inspired by the venerable auto-faq. This release adds support for generating the Last-Modified subheader from the modification timestamp of the FAQ file. Previously, information for that subheader was only taken from an embedded RCS/CVS Id string. It also adds a new -d command-line option that changes the default directory for storing status files.

You can get the latest version from the postfaq distribution page.

2013-01-09: afs-monitor 2.3

Today was a travel day, coming back home from the holidays, so I'm rather stunned that I managed to do yet another software release. But there were a few minor bugs in afs-monitor that didn't take long to fix, and I talked myself into making a release rather than spending more time updating the coding style.

afs-monitor is a package of check scripts for AFS servers. They're designed with Nagios in mind, but they should work with most monitoring systems that can call external checks in a similar way as Nagios.

This is a bug fix release that fixes a relatively serious problem with check_afs_quotas. It has been there since the beginning, but was reported to me by three separate people in the past month; it's weird how those sorts of things come in spurts. It also teaches check_afs_bos about some standard bos output for a feature I don't use (scheduled jobs).

You can get the latest version from the afs-monitor distribution page. New Debian packages (as nagios-plugins-afs) have been uploaded to my personal repository. (If anyone would find it useful to have those in Debian proper, drop me a note and I'll do that.)

2013-01-10: release 1.45 and multiple PGP signatures

release is the tool that I use to manage software releases. This version adds support for signing a tarball with multiple PGP keys at once. For WebAuth, we have an official signing key, but I also wanted to add my personal signature. The minimum supported version of Python was also bumped to 2.5 (see below for why).

You can get the latest version from my scripts page.

Normally, I wouldn't announce this to everyone, since it's a fairly minor change, except I had a question and a quick Python note.

The question: how does one get GnuPG to verify multiple signatures? When I run GnuPG on the resulting detached *.asc file created by this new code, I get the following:

gpg: WARNING: multiple signatures detected.  Only the first will be checked.
gpg: Signature made Thu 10 Jan 2013 03:58:43 PM PST using RSA key ID 5736DE75
gpg: Good signature from "Russ Allbery <rra@stanford.edu>"
gpg:                 aka "Russ Allbery <rra@debian.org>"
gpg:                 aka "Russ Allbery <eagle@windlord.stanford.edu>"
gpg:                 aka "Russ Allbery <eagle@eyrie.org>"

That's nice, I suppose, but it's not as helpful as it could be. Why not verify both signatures and report all of the results? I searched both the gpg man page and the Internet for some option I'm missing, without any luck. (Searching Google for that exact message amusingly turns up lots of verbatim output from Debian repository verification and almost nothing else.)

If anyone knows, send me email (rra@stanford.edu or any other convenient address), and I'll update this post with the answer.

UPDATE: gpg can currently only verify multiple signatures if the signatures are from keys with the same class and digest. I was attempting this with a 2048-bit RSA key and a 1024-bit DSA key, which gpg does not support. (Yet another reason to replace the WebAuth signing key with a newly-generated key.) Thanks to Philip Martin and Bernhard R. Link for the information!

The note is that, way back when I was first learning Python, one of the things that bothered me the most was the lack of safe program execution. The best available was os.spawnv(), which left something to be desired. If you too were bothered by this but don't program in Python enough to keep track of the current state of the art, there's now a subprocess module that has much saner interfaces that work properly, do reasonable things, and avoid the shell. It was added in Python 2.4, with some nicer interfaces added in Python 2.5, so it's fairly safe to use at this point.

2013-01-12: afs-monitor 2.4

I love free software. Two more patches from users of my AFS monitoring scripts (primarily for Nagios) on the heels of the last release, so have another release. This one adds another ignore line to check_afs_bos for the output when there are long-running scheduled jobs, and adds an optional regex filter to check_afs_quotas to select the volumes whose quota one wants to check. Thanks to Georg Sluyterman and Christian Ospelkaus for the patches.

You can get the latest version from the afs-monitor distribution page. As before, Debian packages have been uploaded to my personal repository.

Also, a quick note about a previous journal entry: Philip Martin and Bernhard R. Link pointed me at the relevant part of the gpg source, which will only check signatures if all signatures use the same class and digest. I updated my previous entry. Thank you!

2013-01-13: Slacktivism that may actually help

Normally, I'm not that much of a fan of the slacktivist trend of signing pre-canned petitions, signing things that political organizations send in the mail, and so forth. It takes very little effort and therefore carries very little weight. A lot of those efforts are more exercises in helping the signers feel better about themselves. But there is the occasional exception.

Most of you probably already know about the death of Aaron Swartz. For those who aren't familiar, see Larry Lessig's article. Swartz suffered from depression (and, seriously, fuck depression — it's an awful, horrible disease), but there's little doubt that the ongoing federal prosecution using the full weight of the US district attorney's office to hound him for what amounts to political trespassing was part of what led to his suicide.

As it happens, I personally believe that Swartz committed a crime, and probably should have paid some consequence for it (on the order of a fine or some community service). Hooking your devices up to someone else's network without their permission and messing around in their wiring closets (locked or not) is, for me, akin to traipsing into someone's barn or backyard shed without their permission and using some of their tools because you want to use them. And whether or not one likes the current copyright regime (and I don't like it at all), MIT is still in the awkward position of having to work with it. Abusing their license for your political goals is effectively recruiting them into your activism without their permission, and as I've mentioned before, I have a mild obsession with consent.

It was a crime. However, it was a minor crime, and that's where this whole situation went completely off the rails. One aspect of a justice system is fairness: uniform application of the laws to everyone. However, another aspect of a justice system is proportionality: punishments that fit the crime. Without ever having been convicted, Swartz was already punished completely out of proportion to what he actually did, both financially and emotionally, by prosecution by the US attorney that went far, far beyond zealous into actively abusive. This despite the fact that the owner of the academic papers that he was downloading as an act of open access activism stated they did not want the case to proceed and asked the US government to drop the charges.

I don't think what Swartz did was right, or legal. However, the correct reaction was "look, involuntarily recruiting MIT as an accomplice to your act of civil disobedience is not okay — don't ever do that again." Not "you are evil and should be locked up in prison for 35 years." And I'm completely fed up with the disproportionality of our justice system and the practice of ridiculous over-charging of crimes in an attempt to terrify people into bad plea bargains.

Which brings me to the slacktivism. This is exactly the sort of situation where popular opinion matters. US attorneys who have lost the faith and support of the population they serve won't keep their jobs. And no one in government particularly wants this case to be splashed across the front pages, or to have to answer questions about the appropriateness or proportionality of the prosecution while people are mourning a dead young man. If we make it clear enough to the Obama administration that people are watching, that this matters, and that we're angry about it, not only is it quite likely there will be consequences for this prosecutor, but it may serve as a deterrant for other prosecutors in the future.

There is a petition on WhiteHouse.Gov to remove the district attorney for prosecutorial overreach, and for once taking five minutes to create an account and clicking on a petition may be both useful and appropriate. Taking disciplinary action here is, unlike with a lot of petitions, something that the Obama administration can actually do, directly, without involving the rest of the dysfunctional US government, and without making new law. If you are a US citizen, please consider going to this site and signing it to say that this matters to you and you believe this prosecution was excessive and inappropriate.

Please note: you do not have to believe that Swartz died solely or even primarily because of this prosecution to do this. We'll never really know the complex factors behind his death. But this was a burden that he shouldn't have had to deal with.

Please also note that you do not have to think he was justified in his behavior to sign this petition. This is not a question of whether what he did should be legal or was ethical. Rather, even assuming it was illegal, it's a question of appropriate punishment and proportionality of response. Whether or not you agree with his political cause, there simply was not enough damage done, to anyone, to warrant this kind of aggressive prosecution. And the direct victims agreed, which was the point at which the district attorney should have scaled way back on their actions or dropped the matter entirely.

Not doing so was an abuse of office and position, and that should have consequences.

ETA: Corrected "taking" to "using" in the analogy about tools. I was actually thinking "taking and then returning" when I wrote that, but only the first word made it into the post, and ended up creating a confusing parallel with theft that wasn't intended.

2013-01-14: More on Aaron Swartz

I got some feedback that the analogy I used in my last post was a bit confusing, and indeed I blew the phrasing of the analogy (also now corrected). So let me try this again, since I think there's a subtlety here that may be missed.

I should note for the record that my understanding of what Swartz did that started the process is apparently somewhat based on the description from the prosecution, so it may not be the complete or accurate facts. Since there will now be no trial, we may never find out what the defense was, and whether those facts would be challenged. So it may be best to think of this as a hypothetical. We never established, or will establish, in court exactly what happened.

Swartz was, generally speaking, charged with two things that I consider quite distinct, at least from an ethical perspective. Most of the focus is on the copyright part: downloading JSTOR articles with an intent (never acted upon) to distribute them to the world. There are a bunch of reasons why this may or may not be justified, which are tied into the origin of those articles (many of them were publicly funded) and the legitimacy of copyright licensing agreements. I think there is significant room to hold a variety of opinions on this, although I don't believe that "crime worth 35 years in prison" is even remotely close to justifiable under any interpretation.

However, there is a second part of what he was charged with, and that's primarily what I was commenting on, not the copyright part. He allegedly hooked up a laptop in an unlocked MIT wiring closet without permission and then used MIT's network and JSTOR license to download information. This is, to me, a subtlely but entirely distinct act from the question of whether taking JSTOR's data was ethically or legally wrong. Whether or not one believes that JSTOR's copyrights are not legitimate, it's still not okay to use someone else's network and license or to trespass in their wiring closets without permission. (I work in central IT for a university, so this strikes closer to home.)

And this is where the analogy came in, which I flubbed. I had originally said that this was akin to "traipsing into someone's barn or backyard shed without their permission and taking some of their tools because you want to use them." The word "taking" was wrong; it falsely implies that you weren't going to return the tools. I had been thinking "taking and then returning" when I wrote that, and the important second part didn't make it into the post. So let's try this again:

Swartz's actions at MIT, as I understand them (and many things could change this, such as a revelation that he had MIT's prior permission), are akin to going onto someone's property and into their barn without permission, borrowing their hedge trimmer for a while because you want to use it, and returning the hedge trimmer without any damage when you're done (and without them noticing it was gone).

I'm quite fond of this analogy, since I think it clearly establishes two things:

Most people are going to be unhappy about this happening and will intuitively feel like it should probably be illegal. Not everyone; there are folks who don't believe in personal property, or at least wouldn't extend it to tools in a barn. But most people will feel that someone should ask first before they come borrow your tools. Even if they don't damage them, even if they return them before you notice they're gone, you might have wanted to use the tool at the same time or they might have damaged them without intending to, and they should just ask first. It's common politeness, and depending on the circumstances, someone who doesn't ask and is covert about borrowing tools might be worth calling the police over.
There is absolutely no way in any reasonable moral system that doing this should result in 35 years in prison. Or even 10, or even 1. Yes, most people would consider this a crime, but most people would consider it a minor crime. It's the sort of thing where you might have to impose some consequences just to make sure the message of "knock it off" is delivered firmly, but someone doing this is rude and inconsiderate, not evil.

The JSTOR copyright stuff is more complex to analyze and is more politically divisive, but for me the key points are (a) Swartz never released the data, and (b) JSTOR declined to press charges. To me, that means the deeper copyright questions, which are quite interesting, were never actually reached in this particular case. The crime that did apparently happen was the trespass at MIT, for which I think the above analogy is the right way to think about it.

The point I do want people to take away from this is that one should not overlook the trespass at MIT even if one wants to celebrate the undermining of the copyright regime and doesn't believe JSTOR's data should be considered their private property. Social activism and political disobedience are important and often valuable things, but performing your social activism using other people's stuff is just rude. I think it can be a forgivable rudeness; people can get caught up in the moment and not realize what they're doing. But it's still rude, and it's still not the way to go about civil disobedience.

For both ethical and tactical reasons, involving bystanders in your act of social activism without their consent is a bad approach.

ETA: The problem, of course, with discussing all of this is that while it's relevant and possibly even somewhat important in the broader sense of how our community acts going forward, it also doesn't capture the fact that this was only one incident in a remarkable life. One of the worst problems with the abusive prosecution of Swartz is that it blew this incident completely out of proportion. A moment of arguable judgement should not dominate one's life or cast a shadow over all of one's other accomplishments; the prosecution tried to make it do just that. That's part of what I'm arguing, but ironically that partly feeds into the lack of proportionality in the discussion.

I've gotten kind of far afield here, so let me go back and say explicitly: Swartz was a remarkable person who did much to admire and respect, and the world is a worse place without him.

2013-01-15: afs-admin-tools 2.0

For years, I've distributed the individual AFS administrative tools that we use at Stanford as separate software releases on my web site, but this has clearly been less and less efficient. We've needed to package them for use at Stanford and didn't want to create separate packages for each individual script, we've wanted to get them out of their current CVS repository, and they didn't support any unified configuration system or simple method for local site customization.

I started trying to solve those problems over a year ago by converting all the scattered CVS files into a coherent Git repository and turning the collection of scripts into an actual distribution that I can version and release more like regular software packages. But other things kept coming up, and it's one of the yearly goals I had for last year that I didn't finish.

Today, I finally took the time to finish this off. The scripts frak, fsr, lsmounts, mvto, partinfo, volcreate, volcreate-logs, and volnuke, which were previously distributed separately, are now collectively distributed as afs-admin-tools and all support a standard configuration file and location. I also rolled up some previously unreleased fixes and removed some remaining Stanford-specific behavior.

I have not, with this release, incorporated any of the various changes that people have sent me to the individual scripts. That's next. They're also all still standalone scripts without any common code moved into a Perl module, but I may change that in the future.

Note that some of these scripts refer to scripts that will be part of the afs-mountpoints package. Putting that together is my next task, but it's a bit trickier since we moved the mount point database into an actual database, so I need to put some work into the Perl modules that wrap that database. Hopefully I can get that released before too long as well.

You can get the latest release from the (new! shiny!) afs-admin-tools distribution page.

2013-01-16: A few last thoughts on Aaron Swartz

Daniel Kahn Gillmor has a very good blog post. You should read it. It includes a thoughtful rebuttal to some of my earlier thoughts about activism.

I think I'm developing a richer understanding of where I see boundaries here, but after my last post, I also realized that by focusing on the specific details of what should have been a minor alleged crime, I'm derailing. Swartz did so much else. I made a note to come back to the more theoretical discussion in six months; now isn't the time. Now is the time to celebrate open content and all of the things Swartz achieved. (But thank you very much to the multiple people who have pointed out flaws in my reasoning and attempted approach.)

Hopefully, it's also an opportunity to keep the pressure on for a saner and less abusive judicial system that doesn't threaten people with ridiculous and disproportional punishment in order to terrify them into unwarranted plea bargains. The petition I mentioned has reached nearly 40,000 signatures and passed the threshold (at the time it was posted) for forcing a White House response.

Probably more importantly, it also seems to be creating the feedback cycle that I was hoping to see: the popularity of the petition is causing this story to stay in the news cycle and continue to be written about, which in turn drives more signatures to the petition. I'm not particularly hopeful that the Obama administration cares about the vast and deep problems with our criminal justice system, but I'm somewhat more hopeful that they, like most politicians, hate news cycles that they don't control. The longer this goes on, the stronger the incentive to find some way to make it go away, which could lead to real disciplinary action.

A key committee in the US House of Representatives is starting a formal investigation. One of my local representatives has proposed modifying the US federal law on computer fraud and abuse to remove violations of terms of service from the definition of the crime. (I don't have much hope that this will pass when proposed by the minority party in a fairly hostile House, but the mere act of proposing it keeps the news focus on.) Glenn Greenwald has a (typically long-winded) round-up of news in the Guardian. Note that both Greenwald and Declan McCullagh link directly to the petition in articles in mainstream news outlets.

One thing that slacktivism can do is perpetuate a news cycle until it gets more uncomfortable for people in power. It's still nowhere near as effective as the types of activism that Swartz was so good at, but in this specific case I think one gets a reasonable return on one's five-minute investment of effort.

I'm going to stop talking about this now, since other people are a lot better at this sort of post than I am. But one last link: the medical community has a related open content problem, and theirs is also killing people. Possibly people you know. If all of this has inspired you, as it has me, to care even more about open content, be watching the push for open access to clinical trial data. More background is in Ben Goldacre's TED talk.

2013-01-18: distribrc 1.0

I was just talking to my dad about this script the other day, so now seems like as good of a time as any to finally stick it up with my other publicly released small scripts.

distribrc is a little Perl script I wrote a long time back that copies all my dot-files (everything from .Xresources and .screenrc to more obscure stuff like .gitconfig and .perlcriticrc) to all of my accounts. It supports a simple configuration file that specifies locations to copy files to, commands to do the copying, and which set of files to copy to each host, so it's survived my transition from Kerberos rcp to GSS-API authenticated ssh without needing a single change.

In fact, when I went to tweak the documentation for this public release, I discovered that it was at version 0.7 and the last commit was in May of 1997. So this is a script that I've been using, probably at least once a month, without a single change, for more than 15 years. I suppose you could call that stable. It at least seemed to warrant a 1.0 version number.

The Perl coding style is kind of awful, but hey. At some point I'll probably go through my various scripts and update coding style, and maybe figure out some test suite mechanism, but in the meantime someone else may find it useful. More thorough measures, like using Git to track one's home directory, are probably better, but this script has the advantage of being quite simple and making it very easy to customize exactly what you want to send to each host.

You can get it from my miscellaneous scripts page.

2013-01-20: Have some clouds

It's been a long time since I've posted any photographs here (about 15 months, it looks like), so have some clouds.

This also serves as a test for whether the scripts I use to post photographs have been properly converted to Git. (And a test to see if the occasional amateur photograph eating up screen real estate will provoke annoyed mutters from the direction of Planet Debian, although so far no complaints over my even-longer book reviews.)

I'm still taking photographs occasionally, although not as much as I was at one point. Too many hobbies and not enough time. Part of why I've been slow in posting them is that I was bad about keeping notes about where I took them and bad about keeping up with sorting through them. So I have large collections of photographs from some days where I don't remember exactly where I was, and where I've not already singled out the ones that are worth sharing.

One of the things my hobbies always seem to generate is yet another pile of stuff that's not organized quite to my liking. Maybe some evening I'll feel inspired to do a bit more sorting.

This is a long weekend in the US for some of us, myself included, so I get another day in this weekend. Which is good, since I've only gotten to a fraction of what I wanted to get to, although at least I found several hours to play video games today.

I have finally converted PGP::Sign to Git, so I have no more personal Subversion repositories, but I forgot how much code I'd written to handle its build-time configuration. All of that desperately needs to be reworked, so it's going to take a bit longer than I expected before I can push out another release.

2013-01-22: Log parsing and infinite streams

I have a problem I have to solve for work that involves correlating Apache access and error logs. Part of WebAuth logs successful authentications to the Apache error log, and I want to correlate User-Agent strings with users so that we can figure out what devices our users are using by percentage of users rather than percentage of hits. The problem, as those who have tried to do this prior to Apache 2.4 know, is that Apache doesn't provide any easy way to correlate access and error log entries (made even more complex because two separate components are involved).

I could have just hacked something together, but I've written way too many ad hoc log parsers, and, see, I was reading this book....

The book in question is Mark Jason Dominus's Higher Order Perl. I'm not quite done with it, and will post a full review when I am. I have some problems with it, mostly around the author's choice of example problems. But there is one chapter on infinite streams, and the moment I read that chapter on the train, it struck me as the perfect solution to log parsing problems.

I'm not much of a functional programming person (which is where Dominus is drawing most of the material for this book), so I don't know if this terminology is standard or somewhat unique to the book. An infinite stream in this context is basically a variation on an interator that lets you look at the next item without consuming it. The power comes from putting this modified iterator in front of a generator and use it to consume one element at a time, and then compose this with transformation and filtering functions. That gives you all the power of the iterator to store state and lets you process one logical element from the stream at a time, without inverting your flow of control.

Dominus provides code in the book for a very nice functional implementation of this that's about as close as you're probably going to get to writing Haskell in Perl. Unfortunately, his publisher decided to write their own free software license, so the license is kind of weird and doesn't explicitly permit redistribution of modified code. It's probably fine, but I didn't feel like dealing with it, and I'm more comfortable with writing object-oriented code at the moment (at least in Perl), so I decided to write an object-oriented version of the same code specific for log parsing.

That's what I've been doing since shortly after lunch, and I can't remember the last time I've had this much fun writing code. I have a reasonable first cut at a (fully tested and fully test-driven) log parsing framework built on top of a reasonably generic implementation of the core ideas behind infinite streams. I also used this as an opportunity to experiment with Module::Build, and have discovered that the things I most disliked about it have apparently all been fixed. And I'm also using Perl 5.10 features. (I was tempted to start using some things from 5.12, but I do actually need to run this on Debian stable.) It's rather satisfying to write a thoroughly modern Perl module.

There are some definite drawbacks to writing this in an object-oriented fashion. There's rather more machinery and glue that has to be set up, it's probably a bit slower, and it tends to accumulate layers of calls. One of the advantages of the method with standalone functions and a very simple, transparent data structure is that it's easier to eliminate unnecessary call nesting. But I suspect the object-oriented version will do what I want without any difficulties, and if I feel very inspired, I can always fiddle with it later.

Maybe I'll eventually use this as a project to experiment with Moose as well.

I'm surprised that no one else has done this, but I poked around on CPAN a fair bit and couldn't find anything. This will all show up on CPAN (as Log::Stream) as soon as I've finished enough of it to implement my sample application. And then I'll hopefully find some time to rewrite our metrics system using it, which should simplify it considerably....

2013-01-23: Still having a blast

I did not realize how much I was missing this sort of heads-down programming time in my life until the last two days. I think the last time that I had this much fun with my job was when I was learning Java, and even that wasn't quite this much fun since it was a lot of experimentation rather than being directly productive in a language I know well.

Here's a short list of modern Perl, Perl techniques, and interesting modules that I'd not had a reason to use before today:

Profiling and Devel::NYTProf. What a lovely piece of software. I'm starting to like this trend towards useful developer tools dumping their output in HTML, particularly when many of them are this pretty. (Devel::Cover by way of Test::Strict also does this, and I've been making heavy use of it.)
Memoize. I've known it existed for years, but I've never had a reason to use it. Memoizing str2time when doing log parsing cut the time by a quarter. (I'll probably write a hand-rolled date parser for the specific date format eventually, since str2time is really slow since it's so general, but wrapping Memoize around it takes five seconds.)
Module::Build. I was staying away from this because of all the problems Debian had with embedded code copies, but it appears to not be doing that any more (or at least I can use it in a way that it doesn't). And it really is so much nicer than ExtUtils::MakeMaker. I will probably convert all of my packages over time, although I have to see what it's like for XS modules.

ETA 2013-01-24: Gregor pointed out that I was confusing Module::Build with Module::Install. I feel silly now; I've been avoiding Module::Build solely because I've been confusing it with something else entirely!
HTTP::BrowserDetect. This isn't a general piece of software, of course, but once again it was lovely to discover that CPAN already contained exactly the piece of software that I needed. I'm trying to produce a summary of what browsers our authenticated users use as part of a general software inventory, and this turned the nonsense that browsers send into exactly the sort of management-consumable classifications I needed.
IO::Compress and IO::Uncompress. Yes, this is much nicer than what I used to have to do with Compress::Zlib. It's like people kept working on Perl while I've been using all the things I learned five years ago!

I still have tomorrow morning to continue focusing on this and will hopefully finish up the work to feed long time periods of logs into my analysis tool. And, even better, I have a log parsing framework that I really like and that I think will make any number of things easier in the long run. (Also, it has a remarkably thorough test suite. I love test-driven development. Although I haven't written comprehensive tests for the very final front-end piece, since generating sanitized log input to trigger all the cases is too much effort right at the moment.)

2013-01-24: The "Why?" of Work

(This is going to be long and rambling. Hopefully at some point I'll be able to distill it into something shorter.)

In preparation for a tech leads retreat tomorrow, several of us at work were asked to watch Simon Sinek's TED talk, "How great leaders inspire action".

I'll be honest with you: I hated this talk. Sinek lost me right at the start by portraying his idea as the point of commonality among great leaders (don't get me started on survivorship bias; it's a pet peeve) and then compounded the presentation problem with some dubious biology about brain structure. So, after watching it, I ranted a bit about how much I disliked it (to, as it turns out, people who had gotten a lot out of it).

(Don't do this, btw. It's nearly always worthwhile to suppress negativity about something someone else enjoyed. I say this to increase the pool of people who can remind me of what I said the next time I forget. Which, if my normal pattern holds, will be about five minutes from now.)

Thankfully, I work with tolerant and forgiving people who kindly pointed out the things they saw in the video that I missed, and we ended up having a really good hour and a half discussion, which convinced me that there's an idea under here that's worth talking about. It also helped clarify for me just how much I hate the conventional construction of both leadership and success.

This talk is framed around getting other people to do things, which is one of the reasons why I had such a negative reaction to it. It's right there in the title: leaders inspiring action. This feeds into what at least in the United States is an endemic belief that the world consists of leaders and followers, and that the key to success in the world (in business, in politics, in everything else) is to become one of the leaders and accumulate followers (most frequently as customers, since we have a capitalist obsession). This is then defined as success. I think this idea of success is bullshit.

Now, that statement requires an immediate qualification. Saying that the societal definition of success is bullshit is a statement from privilege. I have the luxury of saying that because I am successful; I'm in a position where I have a lot of control over my own job, I'm not struggling to make ends meet, and I can spend my time pondering existential questions like how to define success. If I were a little less lucky, success would be whatever put food on the table and kept a roof over my head. I'm making an argument from the top of Maslow's hierarchy. But that is, in a roundabout way, my point: why is defining and constructing success still so hard, and why do we do such a bad job at it, even when we're at the top of the pyramid and able to focus on self-actualization?

The context of this talk for my group is pre-work for a discussion about, in Sinek's construction, the "why?" of our group. Why are we here, what is our purpose, and what do we care about? By the mere fact that we are able to ask questions like that, you can correctly surmise that we're already successful. The question, therefore, is what should we do with that success?

I normally hear one or more of the following answers, all of which I find unsatisfying or problematic.

Do more of the things that made us successful. This is useful in that not being successful sucks, so one does want to do some amount of maintenance, but it's also profoundly unsatisfying to me personally. What's the point in being successful if one can't use that success to do something more interesting, exciting, or fulfilling? It feels like an argument from fear: the most important thing to do with success is to ensure that one stays successful.
Use that power to change the world. Now, I want to be very careful here: there are some people who embrace this approach to success and do, in fact, change the world for the better. I would never want to say that this is a bad choice in general.

But the problem I personally have with this approach is that the failure rate for changing the world is quite high. You are probably not going to change the world. That doesn't mean don't try; sometimes you do, and when it happens, it's awesome. But the way my mind works argues against setting this as an explicit goal for me. Setting explicit goals that I fail at messes me up: I care too much about not failing and then I burn out. It works better for me psychologically to treat this as a possibility and a side effect of another goal.
Teach other people how to be successful. Full points for altruism. But let's face it: most of being successful is basically luck, and there's that survivorship bias problem again. It's occasionally possible to do some actual scientific studies and figure out what leads to success, but most of the time you'll instead make up good-sounding stories about things that you believe strongly in and present them as your method for achieving success. I'm sure they're great stories, and they're probably quite inspiring. But they also probably don't have much to do with why you're successful.

So, what should I do with success? Or, put another way, since I have the luxury of figuring out a "why?", what's my "why?"

This question comes at a good time. As I've mentioned the last couple of days here, I've just come off of two days of the most fun I've had at work in the last several years. I spent about 25 hours total building a log parsing infrastructure that I'm quite fond of, and which may even be useful to other people. And I did that in response to a rather prosaic request: produce a report of user agents by authenticated unique users, rather than by hits, so that we can get an idea of what percentage of our community uses different devices or browsers.

This was a problem that I probably could have solved adequately enough for the original request in four hours, maybe less, and then moved on to something else. I spent six times that long on it. That's something I can do because I'm successful: that's the sort of luxury you get when you can define how you want to do your job.

So, apparently I have an answer to my question staring me in my face: what I do with success, when I have it, is use that leeway to produce elegant and comprehensive solutions to problems in a way that fully engages me, makes the problem more interesting, and constructs infrastructure that I can reuse for other problems.

Huh. That sounds like a "why?" response that's quite common among hackers and software developers. Nothing earth-shattering there... except why is that so rare in a business context? Why isn't it common to answer questions like "what is our group mission statement" with answers like that?

This is what I missed in the TED talk, and what the subsequent discussion with my co-workers brought to light for me. I think Sinek was getting at this, but I think he buried the lede. The "why?" should be something that excites you. Something that you're passionate about. Something that you believe in. He says that's because other people will then believe in it too and will buy it from you. I personally don't care about (or, to be honest, have active antipathy towards) that particular outcome, but that's fine; that's not the point. The point is that a "why?" comes from the heart, from something that actually matters, and it creates a motivating and limiting principle. It defines both what you want to do and what you don't want to do.

That gives me a personal answer. My "why?" is that I want to build elegant solutions to problems and do work that I find engaging and can be proud of afterwards. I automate common tasks not because I particularly care about being efficient, but because manually doing common tasks is mind-numbing and boring, and I don't like being bored. I write reliable systems not particularly because that helps clients, but primarily because reliable software is more elegant and beautiful and unreliable software offends me. (Being more usable and less frustrating for clients is also good; don't get me wrong. It's just not a motive. It's an outcome.)

What does that mean for a group mission statement, a group "why?"

Usually these exercises produce some sort of distillation of the collective job responsibilities of the people in the group. Our mission is to maintain core infrastructure to let people do their work and to support authentication and authorization services for the university, yadda yadda yadda... this is all true, in its way, but it's also boring. One can work oneself up to caring about things like that, but it requires a lot of effort.

But we all have individual "why?" answers, and I bet they look more like my answer than they do like traditional mission statements. If we're in a place where we have the luxury of worrying about self-actualization questions, what gets us up in the morning, what makes it exciting to go into work, is probably some variation on doing interesting and engaging work. But it's probably a different variation for everyone in the group.

For example, as you can see from above, I like building things. My happiest moments are when someone gives me a clearly-defined problem statement that fills a real need and then goes away and leaves me in peace to solve it. One thing I've learned is that I'm not very good at coming up with the problem statements myself; I can do it, but usually I end up solving some problem that isn't very important to other people. I love it when my employer can hand me real problems that will make the world better for people, since often they're a lot more interesting (and meaningful) than the problems I come up with on my own.

But that's all highly idiosyncratic and is not going to be shared by everyone in my group. I'm an introvert; the "leave me alone" part of that is important. Other people are extroverts; what gets them up in the morning is, in part, engaging with other people. Some people care passionately about UI design. (I also care passionately about UI design, but the UI designs that I'm passionate about are the ones that are natural for my people, who are apparently aliens from another galaxy, so I'm not the person you want doing UI design for things used by humans.) Others might be particularly interested in researching new technology, or coming up with those problem statements, or in smoothly-running production systems, or in metrics and reporting... I don't really know, but I do know that there's no one answer that fits everyone. Which means that none of our individual "why?" responses should become the group "why?".

However, I think that leads to an answer, and it's the answer I'm going to advocate for in the meeting tomorrow. I believe the "why?" of our team should be to use the leeway, trust, and credibility that we have because we're successful to try to create an environment in which every individual member of the team can follow their own "why?" responses. In other words, I think the mission of our group should not be about any specific technology, or about any specific set of services, or outcomes. The way we should use our success is to let every member of our team work in a way that lights their fire. That makes them excited to come into work. That lets each of us have as much fun as I had in the past two days.

We should have as our goal to create passionate and empowered employees. Nothing more, but nothing less.

This is totally not how group mission statements are done. They're always about blending in to some larger overall technological purpose. But I think that's a mistake, and (despite disliking the presentation), I think that's what this TED talk does actually get at. The purpose is the what, or sometimes the how. It's not the why. And the why isn't static; technology is changing fast, and people are using technology in different ways. Any mission statement around technology today is going to be obsolete in short order, and is going to be too narrow. But I think the flip side is that good technological solutions to the problems of the larger organization are outcomes that fall out of having passionate and inspired employees. If people can work in ways that engage and excite them, they will end up solving problems.

We're all adults; we know that we're paid to do a job and that job needs to involve solving real problems for the larger organization. All of that is obvious, and therefore none of that belongs in a mission statement. A mission statement should state the inobvious. And while some visionary people can come up with mission statements around technology or around how people use technology that can be a rallying point for a team or organization, I think that's much rarer than people like to think it is. If you stumble across one like that, great, but I think most teams, and certainly our team, would be better served by having the team mission statement be to enable every individual on the team to be passionate about their work.

What should our group work on next? Figure out what the university's problems are, line those needs up with the passions of the members of the team, ask the people most excited about each problem how they want to solve that problem, and write down the answers. There's our roadmap and our strategy, all rolled into one.

2013-01-26: Introversion and mission statements

We had our tech leads retreat on Friday, including the discussion about "why," and I thought it was excellent, although I'm still trying to understand some parts of it.

Part of why I'm still working on that is, well, you know that introversion thing? I had five and a half hours of meetings on Friday, and the tail end was an intense four and a half hour retreat, during which I was a major participant the whole time. And tried to say some difficult and intense things, both about the "why" conversation that I posted about and about why the tail end of last year was particularly difficult for me.

All that went very well. I was very happy with the overall meeting.

Sometimes, I half-convince myself that I'm overstating the degree that introversion or that interpersonal interaction affects me, or that I'm using it as an excuse to duck out of things that aren't really what I want to be working on. Then I do something like this, and find myself completely dead on my feet at the end of the meeting. Not physically tired, but utterly and completely drained. I went home and did something completely right-brain and creative and needed that so desperately that I stayed with it until two in the morning, slept in today until past noon, and still have, despite a couple of minor attempts, absolutely no capability to write code or focus on accomplishing anything. I'm fairly sure I'll still be feeling the after-effects on Monday and Tuesday.

I'm a little annoyed at having both missed a day of releasing something to the world (I'd managed to post something every day this year until yesterday) and blowing off exercising yesterday evening for only the second time this year. But I think part of coming to terms with what introversion means is being kind to myself about this. That's something Cain talked a lot about in her book, and it's the advice I'd give to any other introvert, but it's something I'm not applying as well as I could. Being an introvert doesn't mean to not do meetings. It means to pick the ones that are important, do them with all your heart, and then be kind to yourself afterwards just as if you'd put in extreme effort to do something quite energetic and difficult. Because you have.

So, I'm trying not to be annoyed at myself. And I think we need to be better about teaching people about this, because apart from Susan Cain, I don't remember many people writing about this, or understanding it. Occasionally there are screeds against meetings, but not a lot of discussion about when it's worth expending the energy, and about how to be kind to oneself and recharge after putting one's energy into the social interactions that are important.

I did want to mention that while retreats seem to have a bad reputation in tech circles (and I've heard of a lot of retreats that I would hate), the ones that we do in our group are great. We spent about an hour talking about the things I posted about on Thursday, then a couple of hours talking about the larger organizational roadmap and searching out things that aren't on our upcoming work roadmap that feel like large missing pieces, and then a couple of hours identifying and enumerating our technical debt and deciding what major pieces of technical debt we're going to tackle in the next six months. It's a lot of concrete, specific discussion of what we're going to spend time on and what we're going to defend resources for by pushing back against other priorities. If I had to sum it up in a sentence, I'd say that these retreats are all about listing vaguely equal things that exceed our total capacity and making the decisions about which of them we're going to do and which of them we're not going to do so that we don't have to keep wasting time on revisiting unresolved decisions.

The actual mission discussion turned into a discussion by each of us of what we most value in work, stated in terms very similar to what I used in my post, and that was wonderful. I found out things about my co-workers that I didn't know, and which will help me a great deal in improving their enjoyment of their work. I think we came away with a better idea of how to support each other in creating that engagement and excitement, and I think that's the best possible outcome. Since, like I said, I think that's the only real mission statement a group should have.

We did, still, talk about having a mission statement anyway. Apparently having a shared statement is really important to people who aren't me. I'm still struggling with this, since it doesn't make sense to me and doesn't feel useful to me. But maybe it's a sort of validation and confirmation that we're allowed to focus on what we care about?

But the group mission statement we came away with isn't about any specific technology and focuses on exactly the things that I care about: elegance, robustness, durability, and solving whatever problems the university has. As part of that discussion, we reached a surprising and really satisfying unanimity of opinion about what values we care deeply about. It's a value statement as opposed to a distillation of job descriptions, so if a mission statement is valuable for other people, that's one I'm quite happy with. Even if I don't understand why people want to have one.

I'm less satisfied with the roadmap and strategic plan for our larger organization, but I'm starting to realize that's because I'm utterly not the target audience and it's not going to engage or interest me. It's more of a marketing statement that's focused externally, and my proper role in that is to try to help ensure that it reflects the enthusiasms and engagement of the line staff, not to try to turn it into a document that will be inspiring for the line staff. Which ties into what I was writing about previously: in my ideal world, these sorts of focuses flow upwards from the people who will be doing the work, not downward from a high-level manager deciding on an inspirational direction.

2013-01-27: Management and the problem stream

Well, today was another day of sleeping, zoning, and having no brain for programming (and no willpower for that or exercise, which is even more frustrating), so you all get more musing about missions and work. Tomorrow, I will re-engage my normal schedule and turn the television off again, because seriously, two days of recovery should be enough even for a whole afternoon of meetings.

Perhaps the best concept in the scrum methodology for agile project management is the concept of the product backlog and the product owner. For those who aren't familiar, a scrum project has, as one of its inputs, a prioritized list of stories that the team will implement. During each planning session (generally between once a week and once a month), the development team estimates and takes from the front of the product backlog the top-priority stories that can be completed during that period of time. The product owner is responsible for building the list of pending development stories based on business needs and keeping them sorted in priority order so that the work of the team follows the goals of the larger organization. But the product owner does not get to say how difficult the stories are; the development team estimates the stories, and the product owner may change order based on those estimates (to do simpler things earlier, for example).

We've been doing scrum to some extent or another for quite some time now, and while there are a variety of things that I like about it, I think this is the best part. And I think it has very useful concepts for management in general.

The product backlog is effectively a problem stream of the sort that I talked about in my last couple of posts. It's a set of problems that need to be solved, that matter to other people. And, if the team is doing scrum properly, the problems are presented in a form that describe a complete unit of desired functionality with its requirements and constraints, but without assuming any sort of implementation. The problem is the domain of the product owner; the solution is the domain of the development team.

I am increasingly convinced that my quality of life at work, and that of many other people doing similar sorts of development and project work, would be drastically improved if managers considered this role the core of their jobs. (Ironically, at least in our initial experiences with scrum, it was quite rare for regular managers to be product owners; instead, individual line staff or project managers act as product owners. I think this speaks to how confused a lot of managers are about their roles in a technical development organization. This seems to be improving.) The core of the sort of work that I do is a combination of ongoing maintenance and solving new problems. Both of those can be viewed as a sequence of (possibly recurring) stories; I know this because that's largely how I do my own time management. Apart from HR and staffing, the core role of a manager is almost exactly what scrum calls "backlog grooming": communicating with everyone who has input into what their group is doing, finding out what their problems are, translating that into a list of things that the organization needs to have done, prioritizing them, breaking them into manageable chunks, ensuring they are fully specified and actionable (read: well-written stories), and communicating their completion back to other stakeholders (and then possibly iterating).

This lept out at me when I started thinking about our larger strategic vision. That strategic vision is a sort of product backlog: a set of development stories (or, in this case, epics that would be broken down into a large number of smaller stories). But most strategic plans have glaring flaws if they're evaluated by the standards of scrum product backlogs. They're normally written as marketing statements and aimed at the clients or customers rather than at the staff. From the staff perspective, they're often hopelessly vague, not concrete, actionable epics. Often, they are almost entirely composed of grand, conceptual epics that any scrum team would immediately reject as too large and nebulous to even discuss. And often they're undefined but still scheduled: statements that the organization will definitely complete some underspecified thing within the next year or two.

Scrum rightfully rejects any form of scheduling for unspecified stories and epics. Additional vagueness and sketchiness is tolerated the farther the story is from the current iteration, so they can be put into the future backlog, but scrum makes no guarantees about when they'll get done until they've been sized and you can apply a reasonable velocity estimate. If you want to put a date on something, you have to make it concrete. This is exactly the problem that I have with long-range strategic plans. We already know this about technology development: long-range strategic plans are feel-good guesses, half-truths, and lies that we tell other people because we're expected to do so, but which no one really believes. The chance that we can predict anything about the shape of projects and rate of delivery of those projects three years from now is laughable, and certainly none of the work that would be required to make real estimates has normally been done for those sorts of long-term strategic projects.

There are a lot of scrum advocates who would go farther than I in asking for a complete revolution of how technical projects are planned. I'm not sure how realistic that is, or how well the whole scrum process would work when rolling up work across dozens, if not hundreds, of scrum-sized teams. But in the specific area of the role of managers in a development organization, I think we could learn a lot from this product backlog concept. Right now, there is a constant tension between managers who feel like they need to provide some sort of visionary leadership and guidance (the cult of Steve Jobs) and line staff who feel like managers don't understand enough about specific technology to lead or guide anything. But if you ask a technical developer whether, instead, it would be useful for managers to provide a well-structured, priority-ordered, and clean problem stream from the larger organization so that the developer can trust the problem stream (and not have to go talk to everyone themselves) but have control over the choice of implementation within the described constraints, I think you would find a great deal of enthusiasm.

As a further bonus, scrum has a lot of advice about what that problem stream should look like. I think some of the story structuring is overblown (for example, I can't stand the story phrasing structures, the "AS A... I WANT TO... SO THAT..." shackles), but one thing it does correctly emphasize is the difference between problem and solution. The purpose of the story is to describe the outcome from the perspective of the person who will use the resulting technology. Describing the constraints on the solution, in terms of cost or integrations, is actively encouraged. Describing the implementation is verboten; it's up to the project team to choose the implementation that satisfies the constraints. If managers in general would pick up even that simple distinction, I think it would greatly improve the engagement, excitement, and joy that developers could take in their work.

There's very little that's more frustrating than to be given an interesting problem in the guise of a half-designed solution and be told that one is not permitted to apply any of one's expertise in elegance, robustness, or durability to replacing the half-baked partial solution. Or even be allowed to know the actual motivating problem, instead of the half-digested form that the manager is willing to pass on.

2013-01-28: Much better

Indeed, two days was enough time to recover from an afternoon of social, despite having an hour and a half of meetings today. But today was a day of catching up, sorting issues in JIRA, doing lots of planning meetings, and doing some debugging, so I don't have anything new code-wise to give to the world. So have a photograph.

The product backlog is now all sorted out for the Account Services project into the new phases that will define the work through pretty much the end of the calendar year, I'm guessing. And everything is now a story, not a bug or enhancement or something else without story points, and a few things are shuffled into a more reasonable order.

I'm probably going to find a few hours this week to go play video games or work on personal projects, given that I've been working well over 40 hours a week since the start of the year.

2013-01-29: Quick Module::Build note

I'm struggling a little to keep to a schedule since the weekend, so I both didn't follow my normal evening exercise schedule nor did I write this up earlier when I could add more details. Ah well. But I'm still going to post something today, since I like this streak of (mostly) providing some useful or at least hopefully entertaining content.

Today, I converted the Perl module build system inside WebAuth to use Module::Build instead of ExtUtils::MakeMaker, and it was a rousing success. I highly recommend doing this, and am going to be doing this with all of my other Perl packages, both embedded in larger packages and standalone.

I have several packages where Perl XS modules are embedded within a larger Autoconf and Automake project that includes a shared library used by the Perl module. The largest problem with doing this is integrating the build systems in such a way that the Perl module is built against the in-tree version of the shared library, rather than some version of it that may already be on the system.

I've managed to do this with ExtUtils::MakeMaker, but it was horribly ugly, involving overriding internal target rules and setting a bunch of other variables. With Module::Build and ExtUtils::CBuilder, it's much easier and even sensible. Both support standard ways of overriding Config settings, which provides just the lever required. So this:

    our $PATH = '@abs_top_builddir@/lib/.libs';
    my $lddlflags = $Config{lddlflags};
    my $additions = "-L$PATH @LDFLAGS@";
    $lddlflags =~ s%(^| )-L% $additions -L%;
    package MY;
    sub const_loadlibs {
        my $loadlibs = shift->SUPER::const_loadlibs (@_);
        $loadlibs =~ s%^(LD_RUN_PATH =.*[\s:])$main::PATH(:|\n)%$1$2%m;
        return $loadlibs;
    }
    package main;

(comments stripped to just show the code) became this:

    my @LDFLAGS = qw(-L@abs_top_builddir@/lib/.libs @LDFLAGS@);
    my $LDDLFLAGS = join q{ }, @LDFLAGS, $Config{lddlflags};

plus adding config => { lddlflags => $LDDLFLAGS } to the standard Module::Build configuration. So much better!

At some point, I'll write up a set of guidelines on how to embed Perl XS modules into Automake projects.

2013-01-30: Consensus failure

As most people reading this have probably seen, xkcd found a particularly entertaining Wikipedia argument. (It is all true. There are at least 40,000 words arguing over whether to capitalize an "i" in a page title. I went and looked last night and spent several hours following the conversation.)

This is, of course, funny, and I'm sure lots of people went away smiling at an excellent example of bikeshed painting on the Internet. But I think it's rather more serious than it appears to be. In fact, I would go so far as to say that it is a visible manifestation of a deeply-rooted problem that is the core of why I personally refuse to contribute to Wikipedia in any way that requires me to invest non-trivial time or become a member of that community.

To explain that statement requires a bit of background.

One of the very first services I found when I joined the Internet in 1993 was Usenet. Originally, I talked about comics collecting and then Magic: The Gathering, but I eventually ended up involved in Usenet governance. I spent more than ten years deeply invested in Usenet hierarchy administration; roughly, the process of agreeing on names for and sometimes moderators for global Usenet discussion groups, as well as all the supporting infrastructure for Usenet news server administrators to agree upon and share articles in those groups.

Usenet newsgroup creation was, throughout that period, a consensus-driven process. There was a voting system that was used to measure consensus, but the voting system was largely designed to ensure that rough consensus existed (it was heavily skewed towards not changing anything). We took great pride in that fact. I'm sure that one could find many articles from me proudly championing the concept of a consensus-based decision-making process against many alternatives.

That means I was also there when the toxic flaws at the heart of a consensus-based process completely destroyed it and, as a side effect, destroyed the community of people who had formed around it. I will never again invest my time and energy in a community that operates solely via a consensus process unless that community is small enough that consensus can actually work (which means not much more than twenty people). Consensus decision-making in large groups destroys communities and hurts people.

These sorts of extended arguments on Wikipedia are not rare anomolies. They happen with some frequency; just by going through the debate xkcd references and looking at pointers, you can easily find others (Sega Genesis, cat ownership vs. companionship, endless notability debates... there are many). They look funny from outside, but from inside they're hugely demoralizing. Even when they don't get heated and spill over into personal animosity (and if they don't do that, you're lucky), they consume astonishingly vast quantities of time — exactly not the resource that you want to be squandering in volunteer projects. And they do so without giving back anything emotionally.

I know these arguments very well. One of the most contentious and angry arguments in the history of Usenet newsgroup creation was the argument over whether, when breaking up a group into multiple more focused groups, to rename the original group to end in *.misc or leave its existing name. This argument went on for years and led to deep, lingering hatred between the participants. It's the same sort of thing.

These arguments happen in the way that they happen because there is toxic waste at the center of Wikipedia's governance process. And that toxic waste hurts people, wastes their time, and destroys their willingness to volunteer. It undermines community, which is sad and ironic because the whole point of consensus decision-making is to create, support, and empower community. But it doesn't work. It doesn't work for two very simple but very closely-related reasons: it's impossible to ever end an argument, and there is no incentive for anyone to shut up. As a result, it is nearly impossible to get the decision-making to cohere into something that is timely, consistent, unambiguous, and final. These are not optional requirements of a decision-making process. Without them, people will lose faith in the decision-making process itself, and some people will learn that, as long as they never stop talking about something, they can prolong the debate indefinitely and possibly eventually get their way.

Now, of course, I'm deeply involved and invested in Debian. And on the surface, it may look like Debian also uses a consensus-based decision-making process. But while Debian decision-making has some problems, it's much healthier for a few reasons that are useful to examine:

Individual packages in Debian are not managed by consensus, but rather fall under the authority of the package maintainer, who is either a single person or a small team. (Consensus works in small teams; it's great if the number of people involved is five or less.) This drastically limits the number of decisions that are subject to a large-scale consensus process. Nearly all decisions in Debian are made directly by the person doing the work, and only in exceptional circumstances can some larger process be invoked.

This, on the surface, sounds like it could be similar to Wikipedia, but note that appeals to a larger process are much rarer in Debian than they are in Wikipedia. There is a strong bias in favor of letting the person working on the package do their job however they want to do it. This is more social than structural, but it's a core value (and we should be very careful about changing it).

As a result, Debian operates less by mass consensus and more as a federation of semi-autonomous fiefdoms. This sounds worse: it sounds authoritarian and non-democratic. But it's significantly more functional, and it doesn't waste people's time.
On appeal, and for broader questions that necessarily cut across multiple packages, there are several people in Debian who are perceived as having clear authority to make timely and relatively final decisions, who are almost never overruled, and who can therefore effectively end arguments. These include core teams like ftp-master or the release team, formal appeal bodies like the Technical Committee, and the Debian Project Leader. Not all of this ability is written in black and white in the governance process, but that's not the important part; the important part is that it works in practice to provide timely, consistent, unambiguous, and final decisions most of the time. (Some parts work better than others, and we do struggle with timely, but I've seen much, much worse.)
There is a well-established ultimate appeal (a General Resolution), the ultimate appeal process is very specific and concrete, there is little or no ambiguity or human interpretation involved in analyzing the results, it is time-limited, and it's almost universally respected. Everyone abides by the result of a GR if matters get to that point, and while it's theoretically possible to propose another GR immediately to overturn the first one, this doesn't happen in practice. There's also substantial pushback against invoking the GR process for anything that isn't really important.

We still occasionally devolve into endless discussion, usually on some technical topic that cuts across the distribution and usually when the options haven't cohered enough for any authority to make a useful decision (often because it's too early for it to be possible to make a final decision). Those cases, such as the recurring systemd vs. upstart discussion, are still quite frustrating. But they're also relatively rare and largely ignorable because they have to be about just the right sort of issue. Wikipedia has these sorts of arguments all the time, and they're much less ignorable, because consensus is invoked for every minor dispute on any page in the entire project. You have to participate in endless consensus process or risk having your work undone, reverted, or removed.

We would all like all decisions to be made by consensus, since we don't want anyone in a volunteer project to lose (and because we all love to convince other people we're right). And for small teams, consensus is wonderful. But that's precisely because governance in small teams is very easy; humans are naturally adapted for decision-making in small teams and have a huge variety of natural tools, tendencies, and abilities devoted to making those sorts of decisions. You rarely need any formal decision-making process; most decisions will just happen.

You need a decision-making process when enough people are involved that mutual social pressure can't bring a dispute to resolution, and when you're in that situation, consensus is a horrible process.

Due to my past experiences, I'm almost certainly at the extreme in dislike of consensus processes. Most people are not going to refuse to participate in communities solely on that basis. But be aware that 40,000 words of discussion about something that an outsider will not consider particularly important is not an accidental phenomenon. It's not something that just happens randomly on the Internet. It's an effect, and symptom, of a governance process, or lack thereof, and it's something that you can do something about if you want to.

2013-01-31: WebAuth 4.4.1

WebAuth is the site-wide web authentication software that we use at Stanford. We're about to start another round of intensive development on multifactor authentication support, so I wanted to get a minor release out with the other things that have accumulated in the past month and a half.

The main feature change in this release is to add support for another WebLogin configuration callback. This one is run whenever WebLogin is attempting to establish the user's identity and can return any identity it wishes, so it's available as a generic callback to determine identity information from the environment. The most likely use will be to inspect the results of client-side certificate authentication (which Apache puts in various environment variables) and interpret them in a site-specific way.

Also in this release I refactored the WebLogin scripts so that they use FastCGI more correctly. The application objects are now instantiated only once and then reused for the lifetime of the FastCGI script, rather than torn down and set up for each new request. This is primarily of interest to those using the new features that require a memcached connection, since that connection will no longer be constantly set up and torn down, but it should help with speed in other situations. This is the riskiest change, since if I didn't get it exactly right it would be possible for information to leak from one request to the next. I've audited it fairly carefully, though, and think I've gotten everything.

Finally, besides some documentation updates, I've switched the Perl build to Module::Build from ExtUtils::MakeMaker. This should mostly be transparent, but it means some additional Perl modules from CPAN will be required to build the distribution with --enable-webkdc for versions of Perl older than 5.10. (5.10 is fairly old, so I doubt this will be a serious issue.)

You can get the latest release from the official WebAuth distribution site or from my WebAuth distribution pages.

Posts for January 2013

Syndication

Comments

Archives

Other Book Reviews

Fiction Authors

Social Commentary

Technology and Science

Comics