ZedneWeb / Specification of SDF

Syndication Directory Format (SDF)

SDF is an XML format for describing the relationship of channels and the feeds which syndicate them. Its design is modular for easy extension and it is defined as a subset of RDF/XML, so that it can be read by generic RDF processors.

Contents

  1. Introduction
    1. Channels and feeds
    2. Media type and file extension
  2. Core module
    1. The root
    2. Channel descriptions
    3. Feed descriptions
    4. Supplemental elements
    5. Title
    6. Other patterns
  3. TDL module

Introduction

As syndication becomes more prevalent in the web, it will steadily become more useful to provide information about the syndicated feeds associated with a particular web site. For example, a convention using HTML’s link element enables user agents to automatically locate an RSS feed associated with a web page. This particular method becomes less useful when multiple feeds are associated with a web site, as the only information it provides about feeds is a textual description which may or may not allow the user to decide which feeds are desired.

The feeds associated with a channel may vary in four ways:

Of these, only language could be effectively handled through HTTP content negotiation. (Format might be handled with a parameterized media type, such as “application/rss+xml; version=2.0”, but that seems improbable, given that “application/rss+xml” is itself not a formal media type.) Web sites which offer multiple feeds usually provide a listing of available feeds somewhere, but this requires the user to locate the list and interpret its contents.

What is needed, then, is a machine-readable format for describing the feeds associated with a web site, much in the way that RSS provided a machine-readable format for channels. SDF is one such format.

SDF is a strictly-defined XML format, so it can be parsed with lower-level tools such as SAX or Perl. It is defined in RELAX NG, enabling validation with a number of tools. The schema is modular and designed for easy extension.

At the same time, SDF is a subset of RDF/XML, meaning that any valid SDF document is also a valid RDF/XML document. Thus, a generic RDF tool can parse and understand an SDF document. SDF has been designed around existing RDF vocabularies, such as the Dublin Core, RDF Channel, and TDL.

Channels and feeds

Media type and file extension

The appropriate media type for SDF is probably something like “application/sdf+rdf+xml”, but registering such a type should wait until people start using it. In the meantime, “application/xml” is the best choice until the RDF specifications are complete and “application/rdf+xml” is registered.

File extensions are frowned upon as a means of identifying file types, but users are free to use “.xml”, “.rdf”, or “.sdf” as appropriate.

Core module

The core module defines the basic structure of an SDF document.

It uses elements taken from the RDF namespace (http://www.w3.org/1999/02/22-rdf-syntax-ns#), identified by the prefix “rdf:”, the RDF Channel namespace (http://www.eyrie.org/~zednenem/2002/rdfchannel#), here used without a prefix, the Dublin Core namespace (http://purl.org/dc/elements/1.1/), identified by the prefix “dc:”, and the extended Dublin Core namespace (http://purl.org/dc/terms/), identified by the prefix “dcq:”.


default namespace = "http://www.eyrie.org/~zednenem/2002/rdfchannel#"
namespace dc = "http://purl.org/dc/elements/1.1/"
namespace dcq = "http://purl.org/dc/terms/"
namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

The root


start = element rdf:RDF { channel+ & feed+ & supplement* }

The root of an SDF document is an rdf:RDF element containing one or more channel descriptions, one or more feed descriptions, and zero or more supplemental elements in any order.

To make life easier for ad-hoc parsers, namespaces should also be declared in the root element. If this is not feasable, the mapping of namespaces and prefixes should at least be consistent within the document.

Channel descriptions


channel = element Channel { channel.properties }

channel.properties =
    attribute rdf:about { xsd:anyURI }
  & title
  & element dc:description { language?, text }?
  & element dc:language { xsd:language }?

A channel description consists of a Channel element with an rdf:about attribute containing the URI reference identifying the channel. Its contents include a title pattern and optional dc:description and dc:language subelements in any order.

The dc:description element contains a textual description of the channel and may optionally have an xml:lang attribute providing an RFC 3066 language code.

The dc:language element contains an RFC 3066 language code indicating the primary language of the channel.

Example:


<Channel rdf:about="http://news.example.org/">
  <dc:title>Example News</dc:title>
  <dc:description>Recent articles at Example News.</dc:description>
  <dc:language>en</dc:language>
</Channel>

Forward-compatible parsing: Modules may extend the channel pattern to allow alternate elements to describe channels (cf. the TDL module). Thus, there is no foolproof way to identify a channel, except that any URI occuring in a feed’s syndicates element must identify a channel.

Modules may also allow new subelements in the channel.properties pattern. User agents are free to ignore those they don’t understand.

Feed descriptions


feed =
    element Feed { feed.properties }
  | element ItemTitleFeed { feed.properties }
  | element ShortItemFeed { feed.properties }
  | element FullItemFeed { feed.properties }

feed.properties =
    attribute rdf:about { xsd:anyURI }
  & element syndicates { resource }
  & element dc:format { resource }
  & title?
  & element dc:description { language?, text }?
  & element dc:language { xsd:language }?

A feed description consists of a Feed, ItemTitleFeed, ShortItemFeed, or FullItemFeed element with an rdf:about attribute giving the URI where the feed may be found. Its contents include a syndicates element, a dc:format element, an optional title pattern, and optional dc:description and dc:langauge elements in any order.

The element used to describe the feed indicates the level of detail in the feed. An ItemTitleFeed provides titles of items, but no descriptions. A ShortItemFeed provides descriptions or excerpts of items. A FullItemFeed includes the complete content of each item (naturally, this is only possible for feeds which syndicate primarly-textual channels). It is never incorrect to identify a feed with a Feed element; doing so merely provides less information about the feed’s content.

The syndicates element must contain an rdf:resource attribute giving a URI reference identifying the Channel this feed syndicates.

The dc:format element must contain an rdf:resource attribute giving a URI identifying the format of the feed. URIs for popular syndication formats are defined in RDF Channel.

The dc:description element contains a textual description of the feed and may optionally have an xml:lang attribute providing an RFC 3066 language code.

The dc:language element contains an RFC 3066 language code indicating the primary language of the feed. This need not be the same as the primary language of the syndicated channel.

Example:


<!-- feeds may provide different levels of detail -->
<ItemTitleFeed rdf:about="http://news.example.org/feeds/headlines">
  <dc:format rdf:resource="http://www.eyrie.org/~zednenem/2002/rdfchannel#TAXES"/>
  <syndicates rdf:resource="http://news.example.org/"/>
</ItemTitleFeed>

<ShortItemFeed rdf:about="http://news.example.org/feeds/shortitems">
  <dc:format rdf:resource="http://www.eyrie.org/~zednenem/2002/rdfchannel#TAXES"/>
  <syndicates rdf:resource="http://news.example.org/"/>
</ShortItemFeed>

Forward-compatible parsing: Modules may extend the feed pattern to allow alternate elements to describe feeds. A child of rdf:RDF describes a feed if and only if it contains an rdf:about attribute and a syndicates subelement.

Modules may also allow new subelements in the feed.properties pattern. User agents are free to ignore those they don’t understand.

Supplemental elements


supplement = notAllowed

The supplement pattern is a hook for future extension. In the core module, it has no allowed elements.

Titles


title =
    element dc:title { text }
  | ( element dc:title { langauge, text } 
    & element dcq:alternate { language, text }* )

The title pattern, used to give the name of a channel or feed, consists of either a single dc:title element or a dc:title element and zero or more dcq:alternate elements, all with xml:lang attributes. In other words, there may be only one dc:title element; alternate titles must be given by dcq:alternate elements. If any alternate titles are given, then they and the primary title must be language tagged.

Example:


<dc:title xml:lang="de">Das Boot</dc:title>
<dcq:alternate xml:lang="en">The Boat</dcq:alternate>

An English-language user agent might present this title as “Das Boot (The Boat)”.

Other patterns


language = attribute xml:lang { xsd:language }
resource = attribute rdf:resource { xsd:anyURI }

The language and resource patterns are abbreviations for common patterns: the xml:lang attribute, which takes an RFC 3066 language code, and the rdf:resource attribute, which takes a URI or URI reference.

TDL module

This module allows channels to be identified as being weblogs or topics (as defined by TDL). It introduces terms from the TDL namespace (http://www.eyrie.org/~zednenem/2002/web-threads/), here identified by the prefix “tdl:”.


namespace tdl = "http://www.eyrie.org/~zednenem/2002/web-threads/"

channel |=
    element tdl:Topic { topic.properties }
  | element tdl:Weblog { topic.properties }

topic.properties =
    channel.properties
  & ( element tdl:subtopicOf { resource }
    | element tdl:categoryOf { resource } )?

The channel pattern is extended to allow tdl:Topic and tdl:Weblog elements to define channels. Their contents include the normal channel properties and an optional tdl:subtopicOf or tdl:categoryOf subelement.

These properties are used to indicate that one topic or weblog is a subtopic of another. A subtopic contains no posts which are not in its parent topic. A category is a subtopic which preserves the relative order of posts.


<tdl:Weblog rdf:about="http://blog.example.com/">
  <dc:title>Example weblog</dc:title>
</tdl:Weblog>

<tdl:Topic rdf:about="http://blog.example.com/topics/technology">
  <dc:title>Technology</dc:title>
  <tdl:categoryOf rdf:resource="http://blog.example.com"/>
</tdl:Topic>

References

Issues

1. RDF Channel cannot describe variations in coverage

2. Some sort of subject indication could be useful

Given a weblog which has several subtopics with their own feeds, it could be useful to categorize these subtopics by subject. For example:


<tdl:Weblog rdf:about=".">
  <dc:title>Example weblog</dc:title>
</tdl:Weblog>

<tdl:Topic rdf:about="topics/technology">
  <tdl:categoryOf rdf:resource="."/>
  <dc:subject>Technology</dc:subject>
</tdl:Topic>

<tdl:Topic rdf:about="topics/technology/xml">
  <tdl:categoryOf rdf:resource="topics/technology"/>
  <dc:subject>XML</dc:subject>
</tdl:Topic>

In order for this to be useful for software, the subjects need to come from a controlled vocabulary. Alternately, we could identify subjects by URI, but then we have to have a way to present the information to users. Requiring a label for every subject property seems inefficient, but requiring implementations to have a human-understandable description for every URI used to identify subjects is impractical.

Perhaps a way to say “This file uses subject URIs defined by this resource”, but then we need a format for describing subject descriptions.

3. Should we allow multiple tdl:subtopicOf properties?

e.g., "Guns&Butter" tdl:subtopicOf "Guns", "Butter".

4. How to model directory itself
  • Don’t bother.
    • Simple
    • No way to make sure document is a directory
  • Require initial “Directory” element.
    • Natural place to put it
    • Overkill for simple directories
  • Optional initial “Directory” element.
    • There when useful, otherwise not
    • No general way to make sure document is a directory

5. How to associate channels with directory

(Assumes issue 4 results in directory modelling.)

6. How to associate feeds with channel

Dave Menendez