ZedneWeb: Thread Description Language

Introduction

The Thread Description Language (TDL) is an RDF vocabulary for describing threaded discussions, including:

inter-weblog discussions (sometimes called “blogthreads”)
weblog comment systems
message boards
Usenet
e-mail exchanges
instant messaging

Styles of threading

TDL represents discussion threads as collections of linked posts, where a post is a self-contained, atomic contribution to the discussion. The nature of posts varies somewhat in different types of discussions, but they are generally always the work of a single participant. Weblog and message board posts and Usenet, e-mail, and IM messages are a few examples of posts.

Different types of discussions will organize their posts in different ways. TDL is designed to represent as many styles as possible using common features. (Because “thread” means different things in different places, it is not used in the vocabulary itself.)

Usenet and inter-weblog discussions make use of implicit threading. Rather formally declaring threads, each post makes references to one or more posts that precede it in the discussion. With a collection of posts and information about their references, it is possible to arrange these posts in a tree or graph structure representing the discussion.

In TDL, the references between posts are indicated with the property refersTo.

Many message board systems use a similar style called forked threading. Unlike implicit threads, forked threads are explicitly declared, as the centralized control of a message board allows one to know exactly what posts are part of the thread. This style is called forked, because each post may be referenced by multiple posts, resulting in a branched, tree-like form for the thread. (Forked message boards are also frequently called “threaded”.)

In TDL, a specific collection of posts is represented as a Topic, and the lead post(s) in a topic can be indicated with initialPost.

The remaining message boards and IM conversations use linear threading. While forked and implicit threads arrange posts in graphs or trees based on what posts they refer to, linear threads arrange posts in a sequence, usually based on when the posts were created. (Linear message boards are frequently called “unthreaded”.)

In TDL, the property hasPosts indicates an ordered list of posts belonging to something, such as a Topic. [Issue 2]

These styles of threading are not mutually exclusive. The same thread may be viewable as a sequence of posts and a tree. Similarly, while individual discussions in Usenet or among weblogs are not explicitly declared, at another level newsgroups and weblogs can be viewed as explicitly-declared threads.

The connections between posts and topics are many-to-many. A given topic will usually contain multiple posts—perhaps thousands—and a given post may belong to any number of topics. There are a number of ways to associate posts with a topic, but they fall into two broad categories.

A topic which has a value for hasPosts is closed. Only those posts which are part of the list given by hasPosts are part of the topic. This does not mean that the membership of the topic is fixed for all time. An RDF graph containing hasPosts is simply asserting that this set of posts and no others belong to the topic. It is up to implementations to decide which assertions to believe at what time.

In many cases, it is not practical or even possible to list every post in a topic. A topic which does not have a value for hasPosts is open. With an open topic, the only possible answers to the question “Is this post part of this topic?” are “Yes” and “Unknown”.

Conventions used in this document

In this document, several prefixes are used to abbreviate references to existing namespaces. A term such as “dcterm:isPartOf” is actually shorthand for the URI reference “http://purl.org/dc/terms/isPartOf”. Terms with no prefix (or the empty N3 prefix “:”) are from the TDL namespace.

Namespace prefixes used in this document
Prefix	URI Reference	Comment
	http://www.eyrie.org/~zednenem/2002/web-threads/	TDL
dc:	http://purl.org/dc/elements/1.1/	Dublin Core element set
dcterm:	http://purl.org/dc/terms/	Dublin Core qualifiers
dctype:	http://purl.org/dc/dcmitype/	DCMI resource types
rdf:	http://www.w3.org/1999/02/22-rdf-syntax-ns#	RDF Core
rdfc:	http://www.eyrie.org/~zednenem/2002/rdfchannel#	RDF Channel
rdfs:	http://www.w3.org/2000/01/rdf-schema#	RDF Schema

As always, it is the URI reference which is meaningful, not the particular prefix.

The term “resource” is used in different ways by different standards on the web. To avoid confusion, this document uses the term item to refer any node in an RDF graph about which statements can be made (that is, everything except literals).

An RDF graph is a set of statements, such as the statements made by a single file. A graph asserts the truth of the statements it contains. This graph may be inconsistent if it asserts statements which contradict each other. This document does not describe how these inconsistencies should be dealt with.

TDL is defined in terms of RDF statements, not any particular serialization of RDF. Examples in this document are given using the N3 syntax, but TDL data may be expressed in any appropriate syntax. (An easily-parsed subset of RDF/XML could be described in an appendix, given interest.)

Core vocabulary

Classes

Post: A single part of a discussion, such as a weblog or message board post, or Usenet, e-mail, or IM message.
Subclass of: dctype:Text
Topic: A set of posts, usually connected in some way, such as being a discussion thread, a category in a weblog, or a query response.
Subclass of: dctype:Collection

Properties

agreesWith: Indicates an item which this post agrees with or confirms.
Sub-property of: commentsOn
Domain: Post
categoryOf: Indicates that this topic is a subtopic of the specified topic. The posts in this topic are all part of the larger topic in the same order relative to each other.
Sub-property of: subtopicOf
Domain: Topic [Issue 2]
Range: Topic [Issue 2]
commentsOn: Indicates an item which this post discusses or responds to.
Sub-property of: refersTo
Domain: Post
concludingPost: Indicates a post which concludes this topic.
Sub-property of: dcterm:hasPart
Domain: Topic
Range: Post
content: Indicates a text or XML fragment representing the content of this post.
Domain: Post
currentPosts: Indicates a collection of posts which are considered current, such as the posts mirrored on the front page of a weblog.
Sub-property of: rdfc:current
Domain: Topic, rdfc:Channel
Range: rdf:List (containing Posts)
disagreesWith: Indicates an item which this post rebuts or presents evidence contrary to.
Sub-property of: commentsOn
Domain: Post
excerpt: Indicates a text or XML fragment representing some portion of the content of this post.
Domain: Post
followsUp: Indicates a post, usually from the same author or publisher, which this post updates or corrects.
Sub-property of: refersTo
Domain: Post
Range: Post
hasPosts: Indicates the set of posts which are logically part of this item (usually a Topic). Posts which are not in the list are not part of this item.
Sub-property of: dcterm:hasPart
Range: rdf:List (containing Posts)
hasTopics: Indicates the set of topics which are formally part of this item (usually a Topic).
Sub-property of: dcterm:hasPart
Range: rdf:List (containing Topics)
initialPost: Indicates a post which initiates this topic.
Sub-property of: dcterm:hasPart
Domain: Topic
Range: Post
partOf: Indicates a topic to which this post belongs.
Sub-property of: dcterm:isPartOf
Domain: Post
Range: Topic
pointsTo: Indicates an item which this post refers to but does not discuss.
Sub-property of: refersTo
Domain: Post
quotes: Indicates an item which this post quotes.
Sub-property of: refersTo
Domain: Post
refersTo: Indicates an item which this post references.
Sub-property of: dcterm:references
Domain: Post
segmentOf: Indicates that this topic is a subtopic of the specified topic. The posts in this topic are all part of the larger topic in the same order contiguously.
Sub-property of: subtopicOf
Domain: Topic [Issue 2]
Range: Topic [Issue 2]
subtopicOf: Indicates that this topic is a subtopic of the specified topic. The posts in this topic are all part of the larger topic.
Sub-property of: dcterm:isPartOf
Domain: Topic
Range: Topic

Usage notes

Associating posts with topics

There are three ways to associate a post with a topic:

Given a post P and a topic T, the statement “P :partOf T.” asserts that P is part of T.
Given a topic T and a list of posts L, the statement “T :hasPosts L.” asserts that each post in L is part of T and that any post not in L is not part of T.
Given topics S and T, the statement “S :subtopicOf T.” asserts that each post which is part of S is also part of T.

Given these rules, it is possible for a graph to assert that a post is and is not part of a topic. This indicates that one or more of the statements which gave rise to the inconsistency is incorrect. Determining which statements are to be believed will depend on the circumstances and possibly user preferences.

A point of clarification: It is possible for an item to be a post and a topic. For instance, in a weblog which allows user comments, a post in the weblog would be considered the topic of its comments. Asserting that a combined topic/post P is part of a topic T does not entail that its posts are part of T. Asserting that P is a subtopic of T does entail that its posts are part of T.

Determining post order within a topic

The value of hasPosts is an ordered sequence of posts. Depending on the nature of a topic, this order may or may not be significant at a user level. [Issue 2] A topic has only one ordering of posts. This means that post A may come before or after post B in topic T, but not both. A graph which asserts multiple values of hasPosts for a single topic is inconsistent unless all the values give the same posts in the same order.

The subtopic properties make different statements about how the order of posts in the subtopic is reflected in the larger topic. Given a topic S which contains only the posts A and B in that order and another topic T:

“S :subtopicOf T.” asserts that A and B are part of T, but says nothing about their order in T.
“S :categoryOf T.” asserts that A and B are part of T and that A comes before B in T, but there may be other posts between them.
“S :segmentOf T.” asserts that A and B are part of T and that A comes before B in T and that no other post is between A and B in T.

Thus, it is possible to indicate whether one post precedes another in a topic without needing to list every post in the topic. For example, one might use an anonymous category to state that a post A precedes a post B in topic T:

[ a :Topic; :categoryOf T; :hasPosts ( A B ) ]

Or, to indicate that A is the immediate predecessor of B:

[ a :Topic; :segmentOf T; :hasPosts ( A B ) ]

Issues

1. (Closed) Is use of dcterm:isPartOf consistent with Dublin Core?

Currently, dcterm:isPartOf associates a Post with a Topic, but does not imply a subtopic relation. That is:

A a :Post; dcterm:isPartOf B. B a :Post, :Topic; dcterm:isPartOf C.

Does not entail:

A dcterm:isPartOf C.

However, the natural-language interpretation of dcterm:isPartOf implies that if A is part of B and B is part of C, then A is part of C. It may be more logical to reintroduce inTopic or some similar term to associate posts with topics.

Resolution: A new predicate, partOf, will be used instead of dcterm:isPartOf.

2. Is a method needed to indicate topics where the order of posts is not significant?

Currently, it is inconsistent to have different post sequences for a single topic. However, some types of topics such as Usenet news groups do not have a well-defined ordering of posts. In these cases, it is not inconsistent for one set of statements to assert that A precedes B and another to assert that B precedes A, because the ordering is unimportant.

On the other hand, it is possible to say nothing about the order of posts in a topic by not asserting a value for hasPosts and only asserting subtopics with subtopicOf. For a news group, the large volume of posts makes it unlikely that any value could be accurately asserted for hasPosts.

Perhaps it would be reasonable to create a Newsgroup subclass of Topic with the implication that its posts have no inherent order?

Favored resolution: A subclass of Topic will be created for Topics whose posts have a specific order. segmentOf and categoryOf will use this class as their domain and range. The name of this subclass is currently undecided; possibilities include “LinearTopic”, “OrderedTopic”, “Sequence”, and “Thread”.

Changes from previous versions

In addition to a reorganization of the documentation, this document also includes major changes to the TDL vocabulary since the last version. These are:

The inverse reference properties (referredToBy, followedUpBy, commentedOnBy, agreedWithBy, disagreedWithBy, pointedToBy, and quotedBy) have been eliminated. Implementation experience suggests that inverse properties complicate queries while providing little benefit.
Archive has been eliminated. It created problems in circumstances where a resource was both an Archive and a Topic, as it could only have one value for hasPosts. This would prevent any post outside the Archive from being part of the Topic.
Topics may now contain subtopics, which are declared with subtopicOf or hasTopics. The Topics listed in hasTopics are formal subtopics, such as categories in a weblog or explicit threads in a message board.
Weblog is now a subclass of Topic.
Forum has been eliminated. The ability for Topics to declare formal subtopics makes it unnecessary.
The membership and containment properties (inArchive, inTopic, inForum, inWeblog, hasPost, hasArchive, hasTopic, hasForum, hasWeblog) have been eliminated in favor of dcterm:isPartOf.
The properties hasPosts, hasTopics, and hasWeblogs have been added. The use of rdf:List makes it practical for these properties to entail non-membership of items not included in the value.
The sequence properties (first, prev, next, and last) have been eliminated. New properties, initialPost and concludingPost, are used to indicate the start and end points in a Topic, and hasPosts is used to indicate sequence.
The properties categoryOf and segmentOf have been added. These extend the meaning of subtopicOf to make assertions about the order of posts in the larger topic. Combined with hasPosts, they provide a far more flexible method for describing the order of posts than prev and next.
currentPosts is now a sub-property of rdfc:current. Its domain has changed from Weblog to Topic and its range has changed from rdf:Seq to rdf:List.
The properties hasRSSFeedAt, hasTDLFeedAt, and hasTDLContentFeedAt have been eliminated in favor of RDF Channel’s method of using rdfs:seeAlso with an rdfc:Feed.

Subsequent changes

2002-11-19: Issue 1 resolved, adding predicate partOf.

Thread Description Language

Table of contents

Introduction

Styles of threading

Conventions used in this document

Core vocabulary

Classes

Properties

Usage notes

Associating posts with topics

Determining post order within a topic

Additional vocabulary for weblogs

Classes

Properties

Further reading

Issues

1. (Closed) Is use of dcterm:isPartOf consistent with Dublin Core?

2. Is a method needed to indicate topics where the order of posts is not significant?

Changes from previous versions

Subsequent changes