ZedneWeb
Quidquid latine dictum sit, altum videtur

October 3, 2000

I was planning to discuss the wild, wacky world of logic programming today (seriously!), but I'm not feeling awake enough for an involved entry today. Instead, here's a little complaint about the very structure of HTML itself.

(For those of you unfamiliar with HTML, a quick introduction to the terms used here: An HTML document (such as a web page) consists of a set of elements. Each element can contain text and/or other elements. Examples include headings, paragraphs, lists, and the document itself. In the file, an element is written like so: "<element>contents</element>", where the bits in angle brackets are the start and end tags that enclose the contents of the element.)

HTML, and its successor XHTML, both provide six elements used for describing section headers (creatively enough called h1, h2, and so forth). These are intended to be used hierarchically: The h1 heading applies to the entire document, and each section of it could have an h2 as its header, and any subsection of those would have an h3 as its header, and so forth, giving a structure that might look like this:

<h1>Heading 0</h1>
Section 0
  
<h2>Heading 1</h2>
Section 1
  
<h2>Heading 2</h2>
Section 2
  
<h3>Heading 2.1</h3>
Section 2.1

This isn't a horrible system. It's fairly easy to convert a well-structured document to this general form, but it's also easy to construct a document that makes no sense whatsoever this way. An h3 element might directly follow an h1, which is legal, but nonsensical from the structured standpoint we were hoping for. There's also no easy way to refer to Section 1 (say, to make its background green with a style sheet).

A better way would have been to create something like a section element. A section would consist of an optional heading element, followed by an optional stretch of "block" elements (paragraphs, lists, and such), and then zero or more nested sections. Our example structure then becomes:

<section>
  <heading>Heading 0</heading>
  Section 0
  
  <section>
    <heading>Heading 1</heading>
    Section 1
  </section>
    
  <section>
    <heading>Heading 2</heading>
    Section 2
    
    <section>
      <heading>Heading 2.1</heading>
      Section 2.1
    </section>
  </section>
</section>

A bit easier to see, isn't it? In any case, it's much easier for a computer to work with. Each section--along with all its subsections--can now be treated as a unit. Problems with authors using heading elements out of sequence vanish. Editing and browsing tools can take a page from outliners and allow users to expand and contract sections if they want to see more or less detail.

Want to shift a section further down in the hierarchy? Just enclose it in a higher-level section. Want to move a low-level section to its own page? Just copy and paste, don't worry about adjusting all the heading tags to make sure they're the right level.

The system is flexible enough that it can be used for pure outlines (where only section and heading elements are used) or for narrative fiction (where sections might correspond to scenes). It still allows bizarre markup, like:

<section>
  <section>
    <heading>Unnecessarily-nested heading</heading>
  </section>
  some textual content
</section>

But preventing that is probably more trouble than it's worth.