I was planning to discuss the wild, wacky world of logic programming today (seriously!), but I'm not feeling awake enough for an involved entry today. Instead, here's a little complaint about the very structure of HTML itself.
(For those of you unfamiliar with HTML, a quick introduction to the terms
used here: An HTML document (such as a web page) consists of a set of elements.
Each element can contain text and/or other elements. Examples include headings,
paragraphs, lists, and the document itself. In the file, an element is written
like so: "
where the bits in angle brackets are the start and end tags that enclose the
contents of the element.)
HTML, and its successor XHTML, both provide six elements used for describing
section headers (creatively enough called
so forth). These are intended to be used hierarchically: The
heading applies to the entire document, and each section of it could have
h2 as its header, and any subsection of those would have an
h3 as its header, and so forth, giving a structure that might look
<h1>Heading 0</h1> Section 0 <h2>Heading 1</h2> Section 1 <h2>Heading 2</h2> Section 2 <h3>Heading 2.1</h3> Section 2.1
This isn't a horrible system. It's fairly easy to convert a
well-structured document to this general form, but it's also easy to
construct a document that makes no sense whatsoever this way. An
h3 element might directly follow an
is legal, but nonsensical from the structured standpoint we were hoping
for. There's also no easy way to refer to Section 1 (say, to make its
background green with a style sheet).
A better way would have been to create something like a
element. A section would consist of an optional
followed by an optional stretch of "block" elements (paragraphs, lists, and
such), and then zero or more nested sections. Our example structure then
<section> <heading>Heading 0</heading> Section 0 <section> <heading>Heading 1</heading> Section 1 </section> <section> <heading>Heading 2</heading> Section 2 <section> <heading>Heading 2.1</heading> Section 2.1 </section> </section> </section>
A bit easier to see, isn't it? In any case, it's much easier for a computer to work with. Each section--along with all its subsections--can now be treated as a unit. Problems with authors using heading elements out of sequence vanish. Editing and browsing tools can take a page from outliners and allow users to expand and contract sections if they want to see more or less detail.
Want to shift a section further down in the hierarchy? Just enclose it in a higher-level section. Want to move a low-level section to its own page? Just copy and paste, don't worry about adjusting all the heading tags to make sure they're the right level.
The system is flexible enough that it can be used for pure outlines (where only section and heading elements are used) or for narrative fiction (where sections might correspond to scenes). It still allows bizarre markup, like:
<section> <section> <heading>Unnecessarily-nested heading</heading> </section> some textual content </section>
But preventing that is probably more trouble than it's worth.