Part of the reason that last post went up so late at night was that when I finally finished writing it and tried to run it through the ancient Python script that “manages” ZedneWeb, I suddenly got an uncaught exception.
Keep in mind, I've been using this script since 2003. Before last night, I believe it’s last-modified date was sometime in 2004. Believe me, this was something that had never happened before, because if it had I would have fixed it long ago.
So what happened? It took me a while to track down, since I had to re-familiarize myself with the code, and with Python, but eventually I realized it had to do with the bit that copies the post content into the page template. It’s literally just a search-and-replace1, but the problem is that I implemented it using
result = sre.sub("<#content#>", content, result)
Looks simple, right? Takes a string, replaces any occurrences of
<#content#> with the post content, and uses that as the new string. What could go wrong?
Well, it turns out, Python helpfully interprets the replacement string by looking for backslash escapes. Replacing, e.g., the literal text
\n with a newline. More importantly, it looks for group identifiers of the form
\g<name>, in case you were using a regular expression that defined groups.
This is, quite simply, a bug. One that stood for nearly ten years simply because I had never posted any blog entries containing the string
\g. Until last night, when I posted a bunch code written in Haskell, which uses the backslash to introduce anonymous functions. Post some code defining a variable
g and boom. Crash city.
From what I can tell, the correct2 way to do this would have been to use
result = result.replace("<#content#>", content)
I made the changes, ran the new post through, and everything worked out just fine.3
A bunch of web security gurus just winced at the idea of doing string substitution to put content in a template. They needn’t worry: the code only runs locally on content I control, so the only security risk is me. Anyone who could possibly take advantage of that would already have access to my web host, making the point moot.↩
Well, more correct. The actual correct way would be to do something more robust than string replacement.↩
Until I realized I had posted broken links to the images. And there were probably a few other last-minute fixes as well.↩