ZedneWeb: September 4, 2002

The BlogMD initiative continues to gather steam. It seems my quiet lobbying for my thread description language has indeed been noticed: In his first week wrap-up, N.Z. Bear writes, “If I could make such a thing as ‘required reading’, Dave’s document would definitely qualify.”

Gosh. I guess I should get to work on that coding convention. #

Syndication and its discontents

An author discovers that his weblog is being reprinted at another site. Some investigation reveals that this website is reading the RSS feed his authoring tool generates automatically (he was unaware of this feature) and using it to generate the mirror. He is furious to see his work republished without his permission, with the formatting all screwed up, and a different copyright notice.

Is he right to be upset? Yes and no. Certainly, Nuzee should have been more clear about what was copyrighted by whom. Certainly, Salon and UserLand (the hosting service and creator of the authoring software) should have been clearer about the existance of the RSS feed and its implications. But is it right to equate this with stealing Mr Farr’s writing?

Before I get into that, let me explain what exactly is going on. RSS is a standard for summarizing web sites. It’s based on the “Channels” concept which Microsoft introduced in Internet Explorer 4, and then abandoned when it turned out to be mostly useless. Netscape, which was still a major player at the time, introduced its own channel standard for their “My Netscape” service. A “channel” corresponds to a web site, or part of a web site, and it contains a number of “items”. For example, a newspaper’s channel would have items for its current articles. Initially, items consisted only of a title and a URL, but subsequent standards have added additional information.

(There is a divide in the RSS community between those who view it as an XML file format and those who see it as an RDF vocabulary. It’s deep enough that the two sides disagree about what RSS stands for. Fortunately, it has no bearing on this article, so I can skip the details.)

Here’s what happens. Someone with a website generates an RSS summary for their site. They put this in a file (often called a “feed”) and make it available on their web site. (The ZedneWeb feed, for example, lists the recent weblog posts.) This is referred to as “syndicating”, which I feel is misleading as it implies a more active role. Really, an RSS feed is nothing more than a file on a web server, no different from any other web page. The terms “syndicate” and “publish” imply an effort to distribute information, but all that is really being done is making things available. The web server waits for someone to request a page, and when they do it sends them a copy.

Back to Mr Farr. A debate at Blogroots asks whether it is ethical to republish a weblog on another web site without permission. Well, let’s examine what’s going on a little more closely. Recent versions of RSS allow each item to have a description. Sites like Nuzee use this description to supplement the item title, giving readers more information about the item before they go and read it. However, many weblogs use this feature to distribute the content of their posts. Thus, Nuzee can’t tell whether the description of a given item describes the item or is the item. (This is why I gave the thread description language separate properties for description and content.)

Having said that, is it ethical? I don’t know. My sympathy lies with those who say the greater breach was Salon’s and Userland’s for inappropriately configuring the RSS generator and not explaining its implications. Using terms like “theft”, “abuse”, and “appropriation” seems excessive to me; Gary Burge of Nuzee makes it clear that that is not his intent.

The issue of controlling one’s work comes up, much as it did during the smart tags debate. The details are murky. On one end of the spectrum, it’s clearly wrong to claim authorship of another’s work. On the other, it’s clearly fine that you have a copy of this article on your computer right now (otherwise, you wouldn’t be able to read it). Between them are things like HTTP proxies, web caches, mediators like Crit, and services like Nuzee. I think all three of those are fine, but you—and the courts—might disagree.

Once your data leaves your computer, no technology will let you control what happens to it absolutely. (Things like Palladium will make it harder to break that control, but not impossible.) We must rely on ethics, laws, and social mores to determine which of the things we can do we should not do. Let’s resist the temptation to act in haste before we understand the implications. (via Mark Pilgrim) #

Found after writing article: Shelly Powers’s comments on the RDF/XML divide in RSS and the ethics of republishing syndicated content.