ZedneWeb / Web threading

Thread Description Language

Current version:
This version:
Previous version:

This document describes an RDF vocabulary for describing threaded discussions, including:

Software which works with discussion threads may use this vocabulary as a standard way to exchange threading information. In addition, the vocabulary can be used to store any number of posts from any discussion forum in a standard way. All discussion venues are treated equally, so data from multiple media may be combined into a single data set. This allows one to treat all weblogs and message boards as though they were a single forum.

In addition, this vocabulary includes certain properties specific to weblogs which can be used to describe some common relations, such as the recommendations many weblogs make (the “blogroll”).

Table of contents

  1. Background
  2. Note on namespaces
  3. Methods
    1. Weblogs
    2. Linear message boards
    3. Forked message boards
    4. Weblogs with comments
    5. Usenet
    6. E-mail messages
    7. Instant messages
    8. Web-based archives
  4. Classes
  5. Properties
    1. Cataloging
    2. Membership and containment
    3. References
    4. Sequence
    5. Content
    6. Weblog-specific
  6. Profiles


The goal of the thread description language is to describe all forms of on-line discussion, including weblogs, message boards, Usenet, e-mail, instant messages, and anything else which can be described in terms of Posts.

The basic unit of an on-line conversation is the Post. A discussion comprises a set of posts by various authors which are related to each other. This set of related posts is a thread. Structurally, threads can be either implicit or explicit, and they may be linear or forked.

In an explicit thread, the posts which compose the thread are marked as being part of the same thread. Implicit threads, however, must be derived from the pattern of references found among a set of posts.

In a linear thread, each post follows another, as one finds in an instant message conversation or “unthreaded” message board. In a forked thread, any given Post may be followed by multiple posts, forming a tree of responses. For maximum flexibility, posts in a forked thread may also follow mulitple posts, in addition to being followed by multiple posts.

The Thread Description Language is a set of RDF classes and properties which are used to describe discussion threads and forums. By implementing it in RDF, we gain interoperability with other vocabularies, extensibility, and a well-defined serialization format, RDF-XML.

Each post is identifed by a URI, and relations between them are implemented by properties. To represent the connections between posts in a linear thread, we use the sequence properties, which are “next” and its variants. For a forked thread, we use the reference properties, which are “refersTo” and its variants.

- goal is to describe all forms of discussion threads:
  weblogs, msg boards, usenet, mailing lists, &c
- use RDF as a standard way to describe things, not necessarily
  for internal representation
- useful to have a standard way to express "X is last post of Y", even if
  it's not useful to store it
- RDF-XML is a handy way to store these facts in a file
- this means that TDV can represent any discussion forum - ThreadML

Uses and applications:
- A model for those developing dicussion-thread-based software
- Exchange format for threading information extracted from blogthreads,
  message boards, etc.
- File format for storing threads from message boards, usenet, IM, etc

Examples, a linear thread, a weblog.

Note on namespaces

The RDF-XML examples given here all make use of a common set of namespace declarations. For brevity, they are omitted in the examples themselves.

Prefix URI Note
(none) http://www.eyrie.org/~zednenem/2002/web-threads/ The thread description vocabulary
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# The core RDF vocabulary
dc http://purl.org/dc/elements/1.1/ The Dublin Core cataloging vocabulary
html http://www.w3.org/1999/xhtml XHTML 1

This is equivalent to this code:


  example code



Some examples to give an idea of how existing discussion forums may be represented.


While individual weblogs might not be considered a discussion forum, one can look at the weblog community as a whole as a giant, distributed discussion. This is enabled by the use of URI references to identify posts and hyperlinks to make references. Naturally, the permanent address of each post (often called the “permalink”, although that conflates the act of pointing with the object being pointed at) would be used to identify each post. The hyperlinks contained within the post serve as references. (This raises the question of how one determines which hyperlinks a given post contains. This is beyond the scope of this document, although the associated coding convention describes one such method.)

The universe of weblogs (or “blogosphere”) is implicitly threaded, and several proposals exist for standard ways of indicating explicit threads. Standard hyperlinks imply the “refersTo” property. (Again, the coding convention describes how other references may be specified.)

Any given weblog post belongs to a Weblog, although that membership may not be derivable from the post’s encoding. Most weblog posts are encoded in some variant of HTML, but for minimal confusion it is recommended that the value of the “content” parameter be given as an XHTML 1.1 fragment. This avoids incompatability with RDF-XML, and should not result in any loss of information.

Linear message board

In a linear, or “unthreaded”, message board, each post follows another in chronological sequence. If individual posts can be assigned URI references, then they can be represented as explicit, linear threads.

Each thread is represented by a Topic, and the first post is identified by the Topic’s “first” property. The posts use “next” to indicate the following post. For an example of a Topic encoded this way, see this RDF-XML transcription of a Quick Topic thread.

Forked message board

In a forked, or “threaded”, message board, each post either begins a thread (represented here by a Topic) or replies to an existing post. They are explicitly threaded, because each post is associated with a specific thread. The threads themselves are forked, because posts may have multiple responses. (Some forked message boards also provide a chronological ordering for posts, allowing the thread to be viewed as a tree or a sequence.)

As with linear boards, each thread is represented by a Topic, and the first post in the thread is identified by the Topic’s “first” property. The posts relate to each other through “refersTo” (at a minimum; a message board can probably assume “commentsOn” or better for direct replies) and possibly also through “next”. The Topic will usually be part of a Forum. In large message boards, the Forums may themselves be organized into larger Forums.

Weblog with comments

Many weblogs have a comments feature which allows readers to respond to posts within the weblog itself. These comments add to the pre-existing inter-weblog discussion, and can potentially be referenced themselves by other weblog or message board posts. Interestingly, one could consider news sites which feature “talkback” to be examples of this pattern.

A set of comments to a single weblog post constitute an explicit thread which may be forked, linear, or both depending on the message board setup. In this case, the original post does double duty as a Post and a Topic. If the comments are linear, the Topic’s “first” property identifies the first response. Otherwise, the Posts in the Topic use “refersTo” to indicate whether they are responding to the weblog post or to another post in the Topic.

A minimal example to illustrate a Post/Topic:

<Post rdf:about="http://example.org/blog/455">
  <dc:title>What do you think?</dc:title>
  <inWeblog rdf:resource="http://example.org/blog"/>
  <first rdf:resource="http://example.org/blog/455#m1"/>

<Post rdf:about="http://example.org/blog/455#m1">
  <inTopic rdf:resource="http://example.org/blog/455"/>
  <next rdf:resource="http://example.org/blog/455#m2"/>

<Post rdf:about="http://example.org/blog/455#m2">
  <inTopic rdf:resource="http://example.org/blog/455"/>

Note that “http://example.org/blog/455” is not explicitly identifed as a Topic; this is all right, as most software would be able to figure it out as it has the “first” property and is the value of several posts’ “inTopic” property. However, different implementations may present different information or the same information in different ways.

Note also that all three posts in that example are part of the same document. This is also not required. All that is important is that the three URI references are different.


Each Usenet message is required to have a unique message ID. This forms the basis of part of the news: URI scheme. A message with the ID “1998090902325900.WAA04282@example.org” has the address “news:1998090902325900.WAA04282@example.org”. Specific Usenet newsgroups are also given unique names such as “alt.example” or “rec.arts.tv.mst3k.misc”. These are similarly represented by the news: URI scheme as “news:alt.example” and “news:rec.arts.tv.mst3k.misc”. Message IDs always contain a commercial at-sign (“@”), and newsgroup names never contain one, so there is no possibility of confusing a message ID and a newsgroup name.

Newsgroups are implicitly threaded. Each message contains a header specifying the posts it refers to (ie, posts earlier in its thread). This header corresponds to the “refersTo” property, and the messages listed are its values.

The newsgroups themselves are represented as Topics, and Usenet as a whole can be thought of as a Forum. Crossposted messages have multiple values for “inTopic”.

An example message posted to “alt.example” (some headers omitted):

Newsgroups: alt.example
Subject: Re: Test
From: Mr Example <example@example.org>
Date: 20 July 2002 19:08:16 -0800
Message-ID: <1998090902325900.WAA04282@example.org>
References: <v03007802af318a543ec6@hypothetical.com>

> This message is posted to alt.example.

Yes, that appears to be true.

This can be represented like so:

<Post rdf:about="1998090902325900.WAA04282@example.org">
  <dc:title>Re: Test</dc:title>
  <dc:creator>Mr Example &lt;example@example.org&gt;</dc:creator>
  <inTopic rdf:resource="news:alt.example"/>
  <refersTo rdf:resource="news:v03007802af318a543ec6@hypothetical.com"/>
  <refersTo rdf:resource="news:qumwws2zd6s.fsf@apocryphal.edu"/>
  <content xml:space="preserve">
&gt; This message is posted to alt.example.

Yes, that appears to be true.

Although the Usenet headers only provide for references made between Usenet messages, applications are free to infer additional references from URIs contained in the message text.

[@@ there should be a note somewhere about MIME and content]

E-mail messages

Like Usenet messages, e-mails contain message identifiers which are required to be unique. These form the basis of the mid: URI scheme, as with “mid:528FA5637F7A16419AD8FC006128E6DCBC6836@example.org”. Thus, they too can be represented as posts. Additionally, some mail clients include a references header when making a reply (although this is not common practice) which allows for some “refersTo” properties to be inferred.

E-mail as a whole is implicitly threaded (to the extent that references can be inferred from context). Mailing lists can be represented as Topics, particularly if they include the List-URL header or otherwise have a unique address. Because mailing lists are centrally managed, they can have sequential and forked threading.

Instant Messages

While there do not appear to be any standards for identifying instant messages, there are some defined URN schemes which are sufficiently decentralized to be useful. One could, for example, assign each instant message a UUID, which are defined in such a way that it is highly improbable for two items to be given the same identifier.

Instant messages are implicitly threaded and sequential, but they can be organized into Topics. One (non-optimal) way to do it is to explicitly identify the topic when saving an instant messaging conversation: all the messages being saved are considered part of the Topic. The Topic is given a UUID, and each messages is declared a Post. The Posts can be given fragment identifiers based on the Thread’s address.

An example of a very brief conversation:

<Topic rdf:about="urn:uuid:1234-5678-90abcdef1234-5678">
  <dc:title>IM conversation between Mr H and Mr A</dc:title>
  <dc:contributor>Mr H</dc:contributor>
  <dc:contributor>Mr A</dc:contributor>
  <first rdf:resource="urn:uuid:1234-5678-90abcdef1234-5678#m1"/>
<Post rdf:about="urn:uuid:1234-5678-90abcdef1234-5678#m1">
  <dc:creator>Mr H</dc:creator>
  <inTopic rdf:resource="urn:uuid:1234-5678-90abcdef1234-5678"/>
  <next rdf:resource="urn:uuid:1234-5678-90abcdef1234-5678#m2"/>
  <content rdf:parseType="Literal">
    Gosh, I <html:b>hate</html:b> the rain.

<Post rdf:about="urn:uuid:1234-5678-90abcdef1234-5678#m2">
  <dc:creator>Mr A</dc:creator>
  <inTopic rdf:resource="urn:uuid:1234-5678-90abcdef1234-5678"/>
  <content rdf:parseType="Literal">
    Without it, the flowers would die.

This method for representing instant messaging conversations is only one possibile way to apply the thread description language, and it does have the disadvantage that, if both parties export the conversation they will assign different URIs to the Topic and Posts. While a superior method will undoubtedly present itself in the future, this one is good enough to put IM on the same footing as weblogs, message boards, Usenet, and e-mail.

Web-based archives

While weblog and message-board authors are free to link directly to Usenet and e-mail messages, they generally will not because browsers cannot dereference news: and mid: URIs. Thus, a method is needed to identify some URIs as aliases for other URIs. The appropriate choice here is probably daml:sameIndividualAs [@@sp?].


We define five types of resources for dealing with threading. One, Post, is essential. The other five are collections of posts with various different uses and properties.

The fundamental atom of discussion. A post may be a single posting to a weblog or message board, a Usenet message, an e-mail, an individual statement in an IM conversation, or anything else that can be part of a thread and can be assigned a URI reference. Posts typically have a single author and do not change over time. Posts may appear in multiple locations (such as the archives and front page of a weblog). The location specified by their address is their permanent location, others are possibly-temporary mirrors.
A resource where one or more posts are located. Multiple posts appearing as part of the same archive are distinguished by fragment identifiers. Aside from being repositories for posts, archives have no major significance. A resource may be a post and an archive at the same time.
An explicitly-declared thread. Topics group posts not by their location (as archives do) but by some common relation, such as being a direct or indirect response to a resource. Topics correspond to message board threads and to the commenting features supported by some weblogs. In the latter case, the post to which the comments are made can also be the topic.
A collection of topics or sub-forums, such as a message board which supports multiple threads. Some message boards separate topics in broad categories; they can be viewed as a forum containing multiple sub-forums, each of which contains several topics. The URI used to should resove to an introductory page which might list sub-forums or a selection of topics. [@@ needs tweaking; newsgroups should qualify as forums, what about mailing lists?]
A set of posts controlled by a single authority. Weblogs are popular form of personal web site used for publishing essays, pointing to interesting web resources, self-promotion, or many other uses. Weblogs have a number of common features which are described later. The URI used to identify the weblog is also the address of the weblog’s front page, which frequently mirrors the most recent posts.

[@@ describe Post/Archives, Post/Topics, and Post/Archive/Topics more]



Rather than create a new vocabulary to describe common properties such as titles, authors’ names, and so forth, we specify use of the Dublin Core Metadata Set. [@@ namespace, use of “dc:”]

Some of the Dublin Core elements likely to be applied to posts, topics, forums, and weblogs are:

The title, name, or subject line of a post, topic, forum, or weblog. As a rule, this should contain only information unique to the resource, so “Re: Nixon’s dog” is fine but “SoAndSo Discussion Forum—Re: Nixon’s dog” is probably not.
A string identifying the author or authors of a post, or the creator of a topic, forum, or weblog. This could be a name, a nickname, or some other identifying string. (If the intent is for others to know what it means, don’t be too clever.)
A string specifying the publication time of a post. It should be formatted according to the ISO 8601 profile specified in the W3C date/time note as a day, minute, or second. Times are interpreted to mean “sometime in that period”, not “the start of that period”. Thus, “2002-07-20” means any time during July 20, 2002. (Note that timezone information is required for minutes and seconds.)
A string or XHTML fragment describing a post, topic, forum, or weblog. Note that this should not be used to present the content of the resource, use “content” for that. See the description of “content” for ways of including complex strings in RDF-XML.
A string identifying someone who contributed to a thread, forum, or weblog. Similar to dc:creator in syntax.

Membership and containment

Four relations indicate that a resource is part of or belongs to a larger resource.

The archive where a post is permanently located.
A topic to which a post belongs.
A forum to which a post, archive, or topic, or smaller forum belongs.
A weblog to which a post, archive, topic, or forum belongs.

Five relations indicate smaller resources contained within a larger one.

A post located in or part of an archive, topic, forum, or weblog.
An archive belonging to a forum or weblog.
A topic that is part of a forum or weblog.
A forum that is part of a larger forum or weblog.
A weblog that is part of some larger resource.

[@@ what is lost if these are reduced to just “in” and “has”?]


These properties apply to a post and describe the references it makes.

A resource to which this post refers.
A post which this post corrects or updates.
A resource which this post discusses or responds to.
A resource which this post agrees with or amplifies.
A resource which this post rebuts or presents evidence contrary to.
A resource which this post refers to but does not discuss.
A resource which this post quotes

These properties apply to any resource and identify a post which refers to it in some manner.

A post which refers to this resource.
A post which updates or corrects this post.
A post which discusses or responds to this resource.
A post which agrees with or amplifies this resource.
A post which rebuts or presents evidence contrary to this resource.
A post which refers to this resource but does not discuss it.
A post which quotes this resource.


In addition to the graph formed by inter-post references, posts can also be organized in an order, as occurs in a linear thread.

The first post or archive of a topic, forum, or weblog.
The last post or archive of a topic, forum, or weblog.
A post or archive which follows this post or archive in a topic, forum, or weblog.
A post or archive which preceeds this post or archive in a topic, forum, or weblog.


There are several useful applications which require representing the actual content of a post, such as storing a thread in a self-contained file. Rather than define a new file format, we stretch the meaning of “metadata” slightly and declare the “content” property.

An XML fragment containing the content unique to this post.

Note that the value is described as an XML fragment, not a text string. This is because the content of many posts will be best described in XML (or languages such as HTML which have XML equivalents).

Some guidelines are in order to avoid a situation like RSS, where HTML is escaped and reencoded in XML. To represent arbitrary XML content in RDF-XML, RDF defines the rdf:parseType="Literal" processing instruction, which indicates to the RDF parser that the contents of an element should not be parsed for further RDF statements.

<Post rdf:about="http://example.org/blog/455">
  <content rdf:parseType="Literal">
    <html:p>This post contains two paragraphs.</html:p>
    <html:p>This is the <html:em>second</html:em> paragraph.</html:p>

In this particular example, the post’s content is an XHTML fragment (assuming that the “html” namespace prefix is defined appropriately elsewhere in the document). Implementers should be aware of two points:

  1. The meaning of an XML fragment is dependent on what namespace prefixes are declared. Thus, regular expressions and other text-based, non-parsing approaches to working with XML will not always work as expected. Similarly, HTML content must be expressed in well-formed XML (this can be done with no loss of information, because XHTML includes all HTML elements).
  2. The content of the post must survive XML processing, so any elements containing semantic whitespace (ie, where spacing is important) must warn the parser that the spacing is significant by using xml:space="preserve". This includes the HTML pre element, as one can’t expect general XML tools to have special knowledge of the XHTML namespace.

Character strings containing no XML markup can still be considered XML fragments, which is useful for describing posts such as e-mail and Usenet messages. Because the post content will undergo XML parsing, any reserved characters (“<”, “>”, and “&”) must be escaped and the xml:space="preserve" instruction should be used to preserve whitespace-based formatting. To produce readable markup, applications may insert newlines before and after the post content. (If a post begins or ends with a newline and that newline is considered important, then an additional newline must be inserted so that parsers will not strip it out.)

<Post rdf:about="mid:1234@example.org">
  <content xml:space="preserve">
Mr Hypothetical writes:
&gt; Where is the AT&amp;T website?

Try &lt;http://www.att.com/&gt;.


Software which understands the content property would discard the initial and final newlines, leaving this message (newlines are marked “\n”):

Mr Hypothetical writes:\n
> Where is the AT&T website?\n
Try <http://www.att.com>.\n

Note that the final </content> is not indented. This is because the last character in the element is a newline. If it had been indented, then the two newlines and the whitespace used to indent the tag would have been included in the post content.

The purpose of the content property is to represent the content of a post, so Usenet and e-mail headers should not be included. If the information in the headers is deemed important and not covered by an existing RDF property, then a new property should be created.


These properties identify elements found in many weblogs.

A resource, such as another weblog, which is linked to in a prominent place in a weblog (often called the “blogroll”).
A page which lists recommended sites, often but not always the same as the front page of a weblog.
A sequence (rdf:Seq) of posts which are considered to be “current”. For example, the posts currently present in the weblog’s front page can be considered current.
An RSS feed which may be associated with the weblog, usually to list or syndicate current posts.


[@@Note: this whole section is pretty experimental]

Because RDF and the thread description language are so flexible, it is not simply enough to state that a resource contains metadata for a weblog or thread. Different applications will use different subsets of the universe of possible statements one could make. For example, someone describing the current posts in a weblog can choose to include the content of those posts or only the information about them. Profiles provide a way to indicate what data is being specified.

Profiles are identified by URI. The profiles described in this document are:

The “current” posts in a weblog and their content. (Similar to RSS when used for syndication.)
The “current” posts in a weblog, but not their content. (Similar to RSS when used as a summary.)
General data about a weblog, such as locations of alternate “feeds”.
The weblog’s name and the resources it recommends.
Every post in a thread or topic and their content.
Every post in a thread or topic, but not their content.

We define a “profile” property, which is used to indicate the profile of a given resource. This can be used to distinguish multiple RDF-XML formatted alternate versions of a resource.

For example, assume the weblog “http://example.org/blog” has three XML feeds in addition to its default HTML representation: an RSS feed “http://example.org/blog.rss”, the description of the current posts in thread description language “http://example.org/blog.rdf”, and the description and content of the current in thread description language “http://example.org/blog.synd”. These three resources are all XML-RDF documents and all described the current state of the weblog.

A fifth resource, “http://example.org/blog.meta”, gives metadata about the weblog itself, such as its name and the existance of the three available XML versions. It does so using the concept of representations discussed in “Generic Resources” and using the “profile” property to distinguish them. Each is considered a representation of the weblog itself; that is, the weblog presented in an alternate format.

<Weblog rdf:about="http://example.org/blog">
  <dc:title>Example weblog</dc:title>
  <rssChannel rdf:resource="http://example.org/blog.rss"/>

<u:RepresentationInvariant rdf:about="http://example.org/blog.meta">
  <u:isRepresentationOf rdf:resource="http://example.org/blog"/>
  <profile rdf:resource="

<u:RepresentationInvariant rdf:about="http://example.org/blog.rdf">
  <u:isRepresentationOf rdf:resource="http://example.org/blog"/>
  <profile rdf:resource="

<u:RepresentationInvariant rdf:about="http://example.org/blog.synd">
  <u:isRepresentationOf rdf:resource="http://example.org/blog"/>
  <profile rdf:resource="

(Ideally, profiles would be indicated in the MIME-type itself, rather than requiring a separate profile property. Admittedly, “application/rdf+xml; profile=http://www.eyrie.org/2002/web-threads/#prof-blog-data” is cumbersome, but it would allow the use of profiles in HTTP content negotiation.)

Syndicating weblogs

RSS is commonly used to present weblogs in a standard format, allowing for more flexible treatment of their content. There are several software packages which display the current headlines of a site based on its RSS feed, for example.

Unfortunately, RSS is used both as a way to describe the content of a web site and as a way to present that content. It is impossible to tell which strategy a given feed uses. Worse yet, the attempts to use RSS for presenting content usually involve mis-using the “description” property and encoding HTML content as though it were a text string.

The thread description language can also be used to syndicate weblogs, and we provide two profiles to distinguish feeds which describe posts from feeds which describe and contain posts.

Both profiles require a Weblog resource for each weblog being described. These Weblogs MUST include the dc:title and currentPosts properties, and SHOULD include dc:description, dc:creator, and dc:contributor as appropriate. The value of currentPosts is a sequence of Posts which are considered “current”.

Both profiles also require a Post resource for each post listed in currentPosts. These Posts MUST include dc:title and dc:date and SHOULD include dc:creator, dc:contributor as appropriate. Each post SHOULD also note the resources it references. Posts are not required to note resources which refer to them, as that information may be unavailable or extensive.

Posts are not required to indicate sequence or membership in Archives or Weblogs; that information is implicit. Similarly, the Weblog should not indicate which archives, forums, topics, or posts it contains.

For the #prof-blog-synd-content profile, each Post MUST include the content property. For the #prof-blog-synd-data profile, each Post MUST NOT include the content property.

Describing weblogs

While he syndication profiles focus on a subset of a weblog’s posts, the #prof-blog-data and #prof-blogroll profiles describe the the weblog itself.

The #prof-blog-data profile is used to describe the weblog as a whole; no information about individual posts is given. It requires a Weblog resource for each weblog being described. These Weblogs MUST include the dc:title property and SHOULD include dc:description, dc:creator, dc:contributor, linksPage, rssChannel, recommends, hasForum, and hasTopic as appropriate.

The hasTopic property is used to indicate categories which the weblog uses to organize posts. There should be a Topic resource for each category. Each Topic MUST include a dc:title property and SHOULD include dc:description.

The #prof-blogroll profile is a subset of #prof-blog-data used to describe which resources a given weblog recommends. It requires a Weblog resource for each weblog being described. Each Weblog MUST include the dc:title property and a recommends property for each recommended resource.

Representing threads

[@@ not yet written]

Dave Menendez