Posts for July 2007

2007-07-03: Small book haul

It's been a while since I made a book order, and several new books have been released that I want to put near the top of my queue.

Elizabeth Bear -- Whiskey and Water (sff)
Jacqueline Carey -- Kushiel's Justice (sff)
John C. Wright -- Orphans of Chaos (sff)
John C. Wright -- Fugitives of Chaos (sff)

We'll see if I like the Wright series better than John Clute did.

2007-07-04: Debian Policy

Last fall, after having thought about it for a while during my vacation in October, I volunteered to help with the Debian Policy process in the midst of some particularly pointless political blowups. I, along with various other people, became part of a new delegated Policy committee, and then I promptly didn't do very much for a variety of reasons.

I've been feeling rather guilty about that ever since, but apart from a burst of work on a couple of outstanding issues earlier this year, the guilt didn't get me to do anything. Last night I started working on processing Policy bugs on a whim, worked up some momentum, and so I ended up spending all of today trying to get on top of the current state of the Policy package.

So, current status is that I have six bug fixes committed to my local arch repository and five more in various states of completeness, which feels like a fairly good start. And I went through and triaged all of the open Policy bugs and taught myself how to use the BTS usertags in the process (here are the results). Then, after closing my eyes for a bit (today has been incredibly tiring for some reason), I did some research into the history of Debian Policy so that I have a better sense of what problems have been outstanding for long periods of time and what problems previous Policy maintainers ran into.

The main problem with Policy at the moment is that only Manoj and I are working on it, and Manoj has Grand Plans that will be great if they happen but which won't help in the short run with keeping the bug count down and making forward progress. So we need more people. Which has been a problem for a while, although I've been gratified by the number of people willing to discuss issues.

2007-07-06: Unicode spam

I got home from a nice dinner with friends, dicovered that it was ten degrees cooler in my bedroom than in the rest of the apartment, and decided that was a sign that I should settle in with my laptop. I felt like hacking on something but didn't feel like tackling any of the work-related stuff that I'm "supposed" to be doing. So I decided to start poking at my mdfrm program to improve its UTF-8 handling.

I wrote mdfrm eons ago in Perl when I was using qmail. It's a replacement for the venerable frm program (originally part of elm), which shows you a summary of From and Subject of all your new mail. I run it constantly to see what mail I have pending, and none of the replacements in various other packages quite do what I want.

mdfrm is great, but it's showing its age. Among other things, I wrote it when I was still using a C locale for everything, and it had special-case decoding only of ISO 8859-15 in RFC 2047 encoding. Some time back, I switched to using UTF-8 across the board to match the current trend on Linux in general, and now mdfrm has been spitting out ISO 8859-15, creating invalid UTF-8, and thereby mangling its output (particularly in the presence of spam).

So, the first step was to research Perl encoding libraries, which have come a long way. The Encode module (with all its subclasses) comes with Perl these days and can convert from all sorts of character sets into UTF-8. Even better, I discovered that one of the encodings it supports out of the box is RFC 2047 encoding. So problem solved for anyone who uses MIME properly; if I see an RFC 2047-encoded string, I just pass it through the appropriate decode command and I get UTF-8 back.

Now, a lot of my spam is in Asian character sets, so even with that fix, I was getting a bunch of square blobs. I use the Neep font by Jim Knoble, which has been extended in Debian to include the basics of Unicode but which doesn't have any CJK (Asian -- Chinese/Japanese/Korean) characters. Worse, the square blocks also weren't aligned properly; Perl (via format output) clearly thought they were wider than xterm thought they were. Besides, square blocks are lame; other X programs can use font sets and pull characters not found in a primary font from another font. I wanted xterm to do the same. So I started doing research.

Turns out, xterm can, and even tries to do so automatically. But this is very poorly documented unless you know exactly what to search for in the xterm manual page. xterm uses two fonts: a regular font and a wide font. By default, for wide characters, it tries to find a font double the width of the current font. In the man page, it just talks about "wide characters," but in practice what that means is CJK characters. Now, Neep doesn't provide a wide varient, so the autodetection doesn't work, but it turns out that you can set a wide font separately from the main font using the -fw command-line option and the corresponding wideFont X resource.

Setting that appropriately, I can now see Asian characters when the From and Subject use RFC 2047 encoding! The misc-fixed font works just fine as a wide font with Neep as the narrow font.

But the alignment is still broken. So I need something that will tell me, in mdfrm, what characters take up two character cells and what characters take up only one. This sounds like something that I should be able to get from a Unicode property, and indeed, Unicode Standard Annex #11 has exactly the properties I want. Unfortunately, this isn't one of the classes supported in Perl 5.8 according to perlunicode.

A Google search later, I turned up Unicode::EastAsianWidth, which seemed to do exactly what I wanted. Hm, not packaged for Debian. Well, I can fix that later. However, more fatally, it turns out that it just doesn't, er, work. Later inspection and comparison with the data file reveals that for some reason the module is missing vast swaths of wide character blocks, including the entire main CJK block. Weird. I suppose this may be due to it not having been updated in years.

It turns out that all the CJK characters I get in spam are concentrated in a few large blocks, so right now I'm just hard-coding character ranges into the new version of mdfrm based in the Annex #11 data file. Maybe at some point I'll do something more thorough, but this works. After lots of fighting with Perl formats and then with printf, I gave up on trying to do alignment with Perl's built-in facilities and just wrote the alignment code by hand.

That left the problem that most of my mail is spam and spammers don't use MIME, or standards in general. Instead, most of my non-English mail is in a hodgepodge of native Asian encodings. Enter Encode::Guess, which looked quite promising. Unfortunately, due to how it works, there are some character sets that it will think are always good, so if you include them in the guess list, you get ambiguous results. It doesn't do any character frequency analysis, just looks for characters that are completely invalid for a given encoding, so an encoding like ISO 8859-15 will almost always succeed since nearly every code point is a valid ISO 8859-15 character.

To make a long story short, after a lot of fiddling, I worked out a calling sequence that tries various encodings from strictest to loosest, and now I have something that seems to work. I can recognize Korean spam, ISO-2022-JP with its ESC codes for shifting, and a few other common encodings even without RFC 2047 tagging, alignment now works (after adding a few more character ranges), and after discovering an xterm segfault bug that I should report, I got xterm working with a proper alternate font. So now, when I run frm, I see Asian and Cyrillic characters mixed in with Western European characters, which makes the spam much more fun. And avoids all the alignment and ugly display problems.

I'm probably going to release the new version of mdfrm sometime this weekend under a different name and keep the old version around unchanged, since the new version requires Encode and Perl 5.8 and won't work at all for people who aren't using UTF-8.

Next for Unicode conversion is to change the declared character set for all of my web pages to UTF-8 so that I can start using non-ISO-8859-1 characters on my web pages.

2007-07-07: Used and new book haul

A friend wanted to sell books to a used book store, so we took that as an excuse to hang out, wander through used book stores (and one new one we ran across while wandering through downtown Mountain View), and have a nice lunch on Castro. Here's the spoils:

Greg Bear -- The Serpent Mage (sff)
John Brunner -- The Sheep Look Up (sff)
Steven Brust -- Teckla (sff)
Steven Brust -- Athyra (sff)
Steven Brust -- Phoenix (sff)
Maggie Furey -- Sword of Flame (sff)
Maggie Furey -- Harp of Winds (sff)
Maggie Furey -- Dhiammara (sff)
Robert A. Heinlein -- Glory Road (sff)
Ellen Kushner -- The Privilege of the Sword (sff)
Sharon Lee & Steve Miller -- Pilots Choice (sff)
Valery Leith -- The Riddled Night (sff)
Scott Lynch -- The Lies of Locke Lamora (sff)
Chris Moriarty -- Spin Control (sff)
Diana L. Paxson -- Lady of Darkness (sff)
Frederick Pohl -- Slave Ship (sff)

Mostly filling in subsequent books of series where I already owned the first book but may want to read them all back-to-back when I finally get to them.

2007-07-09: Switching back to Emacs

Let's see if blog posting works.

After watching the direction and velocity of development for a while, I've decided to switch back to Emacs from XEmacs. At the time that I went with XEmacs, many years ago, it was under far more active development and had a lot of features that Emacs just didn't (particularly around colorization, Unicode support, and many supported Lisp library extensions). Now, however, XEmacs development appears to be essentially dead, Emacs development is more alive (and just released a new major version), and all the things that XEmacs had Emacs seems to now have.

I've spent the evening tweaking my configuration for the switch, and in the process doing a few other configuration changes I've wanted to do for a while (switching from Mailcrypt to PGG and from inline PGP signatures to PGP/MIME, switching away from Supercite and my hacked functions to the built-in Gnus citation engine). So far, so good. I like the small bar down the left side that Emacs uses for additional status information.

There are two things I'm currently missing. First, the Emacs cut buffer support in X appears to be broken entirely. I'm not sure what's going on with that; hopefully I can find a fix, since otherwise it's going to get quite annoying. Second, I liked XEmacs's habit of making the cursor smaller when at the end of a line, which I found quite handy. Emacs doesn't appear to support that, but I'll probably instead enable the trailing whitespace highlighting.

2007-07-10: mdfrm-utf8 1.0

This is the result of the work that I did late last week. I finally finished putting it together in a releasable form, which mostly involved updating my web pages. And fixing a cvs2xhtml problem in the process, of course, since one always has to do one more thing.

mdfrm-utf8 is now the supported version of mdfrm; the old version, which assumes ISO 8859-15 and can't deal with internationalization at all, has now been retired and probably won't see another release. I did release one more version with documentation updates to mention that.

mdfrm-utf8 requires a new Perl and assumes a UTF-8 locale, which should be true by default of all current Linux distributions and I would hope other variations of Unix at this point. There are still some configuration options that I want to address and some weirdness around character set guessing, but I'm quite happy with it.

You can get the first release from the mdfrm distribution page.

2007-07-10: Emacs and X cut/paste

Apparently the magic incantation that tells Emacs to behave like any other normal X application instead of like a broken and bizarre throwback is (mouse-sel-mode 1). After running that in my init scripts, mouse cut and paste via the middle button now works the way that it does with everything else. (I have no idea if the clipboard works and don't care; I basically never use it.)

The Emacs manual was completely unhelpful. I figured this out by grepping through the elisp source. (Although, he says optimistically, maybe that's because they fixed this in Emacs 22, which I'm not running yet?)

This was the kind of thing that got me to switch to XEmacs in the first place. Emacs has historically had a ridiculous case of Not Invented Here and you have to track down various weird extensions just to make it behave in a vaguely sane fashion. But wow, it's a lot faster than XEmacs and has a lot of other things going for it. Now that I've fixed a bunch of stupid defaults, I'm quite happy with it.

2007-07-10: reminder 1.7

I got tired of always typing reminder new and having it tell me that was an unknown command, so I made new a synonym of create. Now I don't have to remember which one I'd decided on.

You can get the latest version from the reminder distribution page.

2007-07-11: cvs2xhtml 1.12

When I updated cvs2xhtml to cope with the new format for cvs log messages, I didn't adjust it for when the new format was used for the first log message (which unlike the others doesn't have line change information). Of course, with mdfrm-utf8 last night, I ran into that. That's now been fixed.

You can get the latest version of cvs2xhtml from my web tools distribution page.

2007-07-12: Active Directory unicodePwd

In case anyone else ever needs this:

You can change the password of an Active Directory account by changing the unicodePwd attribute (or setting it, for that matter, for new accounts). However, you have to jump through some security hoops.

First, you have to use TLS (and if you're using OpenLDAP clients, make sure that TLS_CACERT in your ldap.conf points to the right root cert). Since you're using TLS, if you're using GSSAPI, you need to tell GSSAPI not to negotiate a security or privacy layer since AD doesn't support nested security or privacy layers. (Dumb.) The magic incantation for ldap.conf is SASL_SECPROPS minssf=0,maxssf=0.

Then you need to set the attribute. Windows uses little-endian UCS-2 as the character set (which they, unhelpfully, call "Unicode" in all their articles, as if there's only one Unicode encoding). Perl, for example, defaults to big-endian UCS-2 if you just say to use UCS-2.

The password also has to be enclosed in double-quotes. The double-quotes aren't part of the password. I have no idea why Windows does this.

So, to transform the password in a Perl script into the string that AD wants, try: encode("ucs-2le", qq{"$password"}). (This assumes you're using the Encode module that comes with Perl 5.8 and takes care of the charset issues for you.) Then, if you're putting that password in LDIF (because, for instance, you're piping it into ldapadd), you need to base64-encode it. Use MIME::Base64 and then call the encode_base64 function. Finally, you can take the resulting string and put it into the LDIF as:

unicodePwd:: <base64-encoded-string>

It took Ross and I far too long to figure this out. Ross found the last key detail in some web site where the person was byteswapping the encoded output to get around the little- vs. big-endian problem.

2007-07-13: krb5-sync 0.6

Well, it took me all day to get a good release since I kept finding just one more bug after rolling back some temporary code that I thankfully didn't have to keep. But it's finally out.

This release has all of the additional logic that I needed to be able to propagate iPass instances into Active Directory so that we can let our iPass users use an alternative principal. This is unfortunately necessary as the iPass security model is a bad joke. In the process of working on that project (the code for which will also get released, at least as much as it makes sense to do so, but which I haven't had time to put web pages up for), I also learned a lot more about Active Directory's data model and cleaned up a few other things in the sync code. There are also logging improvements and general cleanliness improvements.

You can get the latest version from the krb5-sync distribution page.

2007-07-15: spin 1.64

I hit my first review featuring a character not available in either the standard HTML named escape set or in ISO 8859-1, and since I've now switched to a UTF-8 locale for everything else, I finally changed spin to generate pages with a UTF-8 encoding by default. I've also reconfigured my web server to not declare a character set.

At some point, probably before too long, I'll change the rest of my web tools similarly.

You can get the latest version from my web tools distribution page.

2007-07-17: Shōgun

Review: Shōgun, by James Clavell

Publisher: Dell
Copyright: 1975
ISBN: 0-440-17800-2
Pages: 1211

Shōgun is famous enough that you've probably heard of it. It's a huge historical drama set in Japan around 1600, mixing the adventures of an English pilot shipwrecked in a Japan whose trade is dominated by Portugal with Japanese politics around the rise of the Tokugawa Shogunate. It's been adopted for television as a nine-hour miniseries and has been reprinted repeatedly.

The story is based on an English pilot named William Adams, believed to be the first Briton in Japan when he wrecked there in 1600. Renamed John Blackthorne for the novel, his original impression of Japan is overwhelmingly negative and strange. He and his men are taken captive and held in a pit, treated badly by the Japanese (who consider them filthy barbarians) and shocked by the apparent cruelty and disregard for life in the Japanese culture. The first hundred pages is difficult and not particularly entertaining reading, showing the Japanese as vicious torturers and featuring vivid accounts of filth, disease, scurvy, and violent clashes between the shipwrecked crew and the Japanese. The Portuguese and Spanish control the Asian sea trade at the time and their influence in Japan is dominated by Jesuit Portuguese priests, which adds Catholic vs. Protestant conflict and some tedious yelling about who is damned.

Clavell is setting up a contrast of cultures and a starting point from which Blackthorne's understanding will evolve, but that isn't obvious at first and the start seems quite racist and requires some patience. While he moves frequently between characters and viewpoints, the moral arc of the book and the portrayal to the reader follows Blackthorne's understanding; at the start, the Japanese look unbelievably violent and uncivilized because that's how Blackthorne sees them. By the end of the book, the portrayal at the start will be cast in an entirely new light. But one does have to endure an opening the length of many books that I, at least, found unappealing before the true meat of the story begins.

Once Blackthorne is finally in a more stable position among Toranaga's allies, the story starts to build its main plots: the twisty and complex struggle for political control of Japan among many internal factions (manipulated at times by the Portuguese Jesuits), and a strong romantic sub-plot between Blackthorne and a native Japanese woman. Japanese politics are the plot driver for the book, causing most of the conflict and pushing the characters through the story, but the emotional center of the novel is Blackthorne's slow understanding of Japanese culture, growing appreciation for their world view, and attempt to hold to his core beliefs while adapting and fitting in. Although melodramatic in places, it's an excellent portrayal of a culture clash and slow growth of understanding. Clavell effectively uses points of agreement between Japanese culture and modern culture to tie the reader's experience to Blackthorne's, de-emphasizing those agreements initially when Blackthorne is repulsed and slowly bringing them to center stage as he acclimates. (The handling of personal cleanliness is an excellent example.) Clavell does an excellent job of leading the reader towards seeing the Japanese way as better than the European way at the same rate that Blackthorne does.

In other areas, Clavell is not as skilled. Some of the length of Shōgun comes from the complexity of the plot, but some of it feels like bloat. Clavell has a good enough grasp of pacing that the novel is always moving, but particularly in the early going I occasionally wished he'd move faster. Much of the length is detailed analyses of the emotions of each character, often the same emotions restated in slightly different terms at multiple points in the book. This makes for easy reading, as the reader is never required to figure out character motivations for themselves, but it adds to the melodrama and can feel overwrought. It's hard to write a novel of over 1,200 pages that warrants every page, and Shōgun doesn't, quite.

Clavell also uses (and abuses) the wandering omniscient narrator. I used to not notice this, but after having spent the past few years reading authors with a far tighter grasp of narrative focus and viewpoint, Clavell's lack of discipline bugged me throughout Shōgun. The reader gets into the head of just about everyone and the viewpoint character shifts frequently enough to be confusing. The story will be deep in the head of one character, hearing their emotions, dreams, and perceptions, and then it will be just as deply focused on another character a few paragraphs later without transition or scene break. I occasionally had to re-read paragraphs to find the transition point between one viewpoint and another.

But, writing problems aside, this is an enjoyable epic. Clavell isn't the best stylist or most disciplined writer, but I think he succeeds at his effort to portray Japanese culture through the lens of a Western explorer and to show how completely one's opinion of a culture can change after exposure and thought. The historical basis of the plot is apparently reasonably accurate and dramatic enough to hold one's attention. Despite the long-winded writing, the last five hundred pages flew by for me and I found myself returning to just read a little more.

Save this one for when you have the patience for a long book and tolerance for dramatic love affairs and larger-than-life characters, but it's worth reading, more accurate than you might think, and surprisingly compelling. Just expect to hold your nose through the first few hundred pages until the story really starts.

Rating: 8 out of 10

Permanent review page

2007-07-18: krb5-strength 0.5

In testing our new Kerberos master software, we discovered that we weren't rejecting passwords based on the principal name. On further investigation, I found that was because we were comparing the password to the fully-qualified principal name. krb5-strength now also checks against the unqualified principal name and the reversed version of that.

You can get the latest version from the krb5-strengh distribution page.

Last spun 2024-01-01 from thread modified 2018-01-08