Three interesting articles on diverse topics. I’d write more, but it’s late and I want to note these before I forget them.
- “Homeland Insecurity”, from The Atlantic Monthly, an interview with Bruce Schneier which goes into great detail about what does and does not work in terms of securing things and the need for “ductile” systems which can survive partial failure. The true test of a system is not whether it prevents all attacks (which is impossible) but how it can respond when defenses fail. Well worth reading, even if you have no experience with computer security. (via Ray Ozzie)
- “10 Tips on Writing the Living Web”, from A List Apart, discusses strategies for building a successful “live” web site (ie, one which is frequently updated). ZedneWeb fails the “update consistently” test, as things get posted here pretty much arbitrarily according to my whims and motivation, but I do a pretty good job in leveraging my archive.
- “A Plan for Spam”, by Paul Graham, describes an experiment using Bayesian filters to assess the probability that a given message is spam. It turns out that mathematics and simple pattern-recognition are far more effective than trying to come up with a list of identifying characteristics of spam. It also has the advantage of providing meaningful numbers: even if an assessed probability is inaccurate, it’s still clear what a probability is. (via Wes Felter)
Actually, that last item does provide some food for thought, but it’s on a tangent, so bear with me. A while back, Apple introduced an Internet search feature to the Mac OS called Sherlock. It worked by sending your query to a set of search engines, parsing the results, and combining them into a list. One of the reasons I didn’t use it much (aside from the fact that it still wasn’t as good as Google) was that the combining process tried to sort the results by “relevance”, which it did by comparing the numbers given by the various search engines.
You can see the problem, right? Putting aside the question of what those numbers mean, do the various search engines even operate on the same scale? Lycos might be ranking a pages from 1 to 100 while Excite could be going from -1 to 1 or something. There’s no way to normalize the numbers because they probably don’t even measure the same thing. (Also, the ability of those older search engines to assess relevance was pretty pathetic. There’s a reason Google conquered the world.)
Now, if the engines had all reported the probability that a given page was relevant (as a percentage or a fraction between 0 and 1) then they would at least be comparable. It would also allow for some interesting comparisons if two engines returned the same page. #