spin

(Translate thread, an HTML macro language, into XHTML)

SYNOPSIS

spin [-dhv] [-e pattern ...] [-s url] [-o overrides] source [output]

spin [-s url] [-o overrides] -f

REQUIREMENTS

Perl 5.005 or later and the Image::Size and Text::Balanced modules. Also expects to find faq2html, cvs2xhtml, cl2xhtml, and pod2thread to convert certain types of files. The Git::Repository module is required to determine last change dates for thread source from Git history.

DESCRIPTION

spin implements a fairly simple macro language that expands out into XHTML, as well as serving as a tool to maintain a set of web pages, updating a staging area with the latest versions, converting pages written in the macro language (named "thread"), and running faq2html where directed.

When invoked with the -f option, spin works in filter mode, reading thread from stdin and writing the converted output to stdout. Some features, such as appending a signature or navigation links, are disabled in this mode.

If source is a regular file, output should be the name of the file into which to put the output, and spin will process only that one file (which is assumed to be thread). output may be omitted to send the output to standard output. The same features are disabled in this mode as in filter mode.

Otherwise, each file in the directory source is examined recursively. For each one, it is either copied verbatim into the same relative path under output, used as instructions to an external program (see the details on converters below), or converted to HTML. The HTML output for external programs or for converted pages is put under output with the same file name but with the extension changed to .html. Missing directories are created. If the -d flag is given, files and directories in the output directory that do not correspond to files in the source directory will be deleted.

Files that end in .th are assumed to be in thread and are turned into HTML. For the details of the thread language, see THREAD LANGUAGE below.

Files that end in various other extensions are taken to be instructions to run an external converter on a file. The first line of such a pointer file should be the path to the source file, the second line any arguments to the converter, and the third line the style sheet to use if not the default. Which converter to run is based on the extension of the file as follows:

    .changelog  cl2xhtml
    .faq        faq2html
    .log        cvs log <file> | cvs2xhtml
    .rpod       pod2thread <file> | spin -f

All other files not beginning with a period are copied as-is, except that files or directories named CVS, Makefile, or RCS are ignored. As an exception, .htaccess files are also copied over.

spin looks for a file named .sitemap at the top of the source directory and reads it for navigation information to generate the navigation links at the top and bottom of each page. The format of this file is one line per web page, with indentation showing the tree structure, and with each line formatted as a partial URL, a colon, and a page description. If two pages at the same level aren't related, a line with three dashes should be put between them at the same indentation level. The partial URLs should start with / representing the top of the hierarchy (the source directory), but all generated links will be relative.

Here's an example of a simple .sitemap file:

    /personal/: Personal Information
      /personal/contact.html: Contact Information
      ---
      /personal/projects.html: Current Projects
    /links/: Links
      /links/lit.html: Other Literature
      /links/music.html: Music
      /links/sf.html: Science Fiction and Fantasy

This defines two sub-pages of the top page, /personal/ and /links/. /personal/ has two pages under it that are not part of the same set (and therefore shouldn't have links to each other). /links/ has three pages under it which are part of a set and should be linked between each other.

If .sitemap is present, this navigation information will also be put into the <head> section of the resulting HTML file as <link> tags. Some browsers will display this information as a navigation toolbar.

spin also looks for a file named .signature in the same directory as a thread file (and then at the top of the source tree if none is found in the current directory) and copies its contents verbatim into an <address> block at the end of the XHTML page (so the contents should be valid XHTML). The contents will be surrounded by an <address> tag, and added to the end of the supplied .signature contents will be information about when the page was last modified and generated.

spin looks for a file named .versions at the top of the source directory and reads it for version information. If it is present, each line should be of the form:

    <product>  <version>  <date>  <time>  <files>

where <product> is the name of a product with a version number, <version> is the version, <date> and <time> specify the time of the last release (in ISO YYYY-MM-DD HH:MM:SS format and the local time zone), and <files> is any number of paths relative to source, separated by spaces, listing source thread files that use \version or \release for <product>. If there are more files than can be listed on one line, additional files can be listed on the next and subsequent lines so long as they all begin with whitespace (otherwise, they'll be taken to be other products). This information is not only used for the \version and \release commands, but also as dependency information. If the date of a release is newer than the timestamp of the output from one of the files listed in <files>, that file will be spun again even if it hasn't changed (to pick up the latest version and release information).

spin looks for a file named .rss in each directory it processes. If one is found, spin runs spin-rss on that file, passing the -b option to point to the directory about to be processed. spin does this before processing the files in that directory, so spin-rss can create or update files that will then be processed by spin as normal.

If there is a directory named .git at the top of the source tree, spin will assume that the source is a Git repository and will try to use git log to determine the last modification date of files.

OPTIONS

-d, --delete

After populating the output tree with the results of converting or copying all the files in the source tree, delete all regular files in the output tree that do not have a corresponding file in the source tree. Directories will be mentioned in spin's output but will not be deleted.

-e pattern, --exclude=pattern

Exclude files matching the given regular expression pattern from being converted. This flag may be used multiple times.

-f, --filter

Run spin in filter mode rather than converting a whole tree of files. Thread source is read from stdin and the XHTML output is written to stdout. The signature and navigation links are disabled.

-h, --help

Print out this documentation (which is done simply by feeding the script to perldoc -t).

-o overrides, --overrides=overrides

Load the overrides file using the Perl do command. This file should contain Perl code that overrides or adds to the Perl code that's part of spin. It can be used to define new commands or change the behavior of existing commands.

-s url, --style-url=url

The base URL for style sheets. All style sheets specified in \heading commands will be considered to be relative to this URL and this URL will be prepended to them (otherwise, they'll be referred to as if they're in the same directory as the generated file). This will similarly be used as the base URL to style sheets for the output of cl2xhtml, cvs2xhtml, and faq2html.

-v, --version

Print out the version of spin and exit.

THREAD LANGUAGE

Basic Syntax

A thread file is mostly plain ASCII text with a blank line between paragraphs. There is no need to explicitly mark paragraphs; paragraph boundaries will be inferred from the blank line between them and the appropriate <p> tags will be added to the HTML output. There is no need to escape any character except \ (which should be written as \\) and an unbalanced [ or ] (which should be written as \entity[91] or \entity[93] respectively). Escaping [ or ] is not necessary if the brackets are balanced within the paragraph, and therefore is only rarely needed.

Commands begin with \. For example, the command to insert a line break (corresponding to the <br> tag in HTML) is \break. If the command takes arguments, they are enclosed in square brackets after the command. If there are multiple arguments, they are each enclosed in square brackets and follow each other. Any amount of whitespace (but nothing else) is allowed between the command and the arguments, or between the arguments. So, for example, all of the following are entirely equivalent:

    \link[index.html][Main page]
    \link  [index.html]  [Main page]

    \link[index.html]
    [Main page]

    \link
    [index.html]
    [Main page]

(\link is a command that takes two arguments.)

Commands can take multiple paragraphs of text as arguments in some cases (for things like list items). Commands can be arbitrarily nested.

Some commands take an additional optional argument which specifies the class attribute for that HTML tag, for use with style sheets, or the id attribute, for use with style sheets or as an anchor. That argument is enclosed in parentheses and placed before any other arguments. If the argument begins with #, it will be taken to be an id. Otherwise, it will be taken as a class. For example, a first-level heading is normally written as:

    \h1[Heading]

(with one argument). Either of the following will add a class attribute of header to that HTML container that can be referred to in style sheets:

    \h1(header)[Heading]
    \h1  (header)  [Heading]

and the following would add an id attribute of intro to the heading so that it could be referred to with the anchor #intro:

    \h1(#intro)[Introduction]

Note that the heading commands have special handling for id attributes; see below for more details.

Basic Format

There are two commands that are required to occur in every document. The first is \heading, which must occur before any regular page text. It takes two arguments, the first of which is the page title (the title that shows up in the window title bar for the browser and is the default text for bookmarks, not anything that's displayed as part of the body of the page) and the second of which is the style sheet to use. If there is no style sheet for this page, the second argument should be empty ([]).

The second required command is \signature, which must be the last command in the file. \signature will take care of appending the signature, appending navigation links, closing any open blocks, and any other cleanup that has to happen at the end of a generated HTML page.

It is also highly recommended, if you are using Subversion, CVS, or RCS for revision control, to put \id[$Id$] as the first command in each file. In Subversion, you will also need to enable keyword expansion with svn propset svn:keywords Id file. spin will then take care of putting the last modified date in the footer for you based on the Id timestamp (which may be more accurate than the last modified time of the thread file). If you are using Git, you don't need to include anything special in the thread source; as long as the source directory is the working tree of a Git repository, spin will use Git to determine the last modification date of the file.

You can include other files with the \include command, although it has a few restrictions. The \include command must appear either at the beginning of the file or after a blank line and should be followed by a blank line, and you should be careful not to include the same file recursively. Thread files will not be automatically respun when included files change, so you will need touch the thread file to force it to be respun.

Block Commands

Block commands are commands that should occur in a paragraph by themselves, not containined in a paragraph with other text. They indicate high-level structural elements of the page. Three of them were already discussed above:

\heading[<title>][<style>]

As described above, this sets the page title to <title> and the style sheet to <style>. If the -s option was given, that base URL will be prepended to <style> to form the URL for the style sheet; otherwise, <style> will be used verbatim as a URL.

\id[$Id$]

Tells spin the Subversion, CVS, or RCS revision number and time. This string is embedded verbatim in an HTML comment near the beginning of the generated output as well as used for the last modified information added by the \signature command. For this command to behave properly, it must be given before \heading.

\include[<file>]

Include <file> after the current paragraph. If multiple files are included in the same paragraph, they're included in reverse order, but this behavior may change in later versions of spin. It's strongly recommended to always put the \include command in its own paragraph. Don't put \heading or \signature into an included file; the results won't be correct.

Here are the rest of the block commands. Any argument of <text> can be multiple paragraphs and contain other embedded block commands (so you can nest a list inside another list, for example).

\block[<text>]

Put text in an indented block, equivalent to <blockquote> in HTML. Used primarily for quotations or things like license statements embedded in regular text.

\bullet[<text>]

<text> is formatted as an item in a bullet list. This is like <li> inside <ul> in HTML, but the surrounding list tags are inferred automatically and handled correctly when multiple \bullet commands are used in a row. Normally, <text> is treated like a paragraph.

If used with a class attribute of packed, such as with:

    \bullet(packed)[First item]

then the <text> argument will not be treated as a paragraph and will not be surrounded in <p> tags. No block commands should be used inside this type of \bullet command. This variation will, on most browsers, not put any additional whitespace around the line and will look better for bulleted lists where each item is a single line.

\desc[<heading>][<text>]

An element in a description list, where each item has a tag <heading> and an associated body text of <text>, like <dt> and <dd> in HTML. As with \bullet, the <dl> tags are inferred automatically.

\h1[<heading>] .. \h6[<heading>]

Level one through level six headings, just like <h1> .. <h6> in HTML. If given an id argument, such as:

    \h1(#anchor)[Heading]

then not only will an id attribute be added to the <h1> container but the text of the heading will also be enclosed in an <a name> container to ensure that #anchor can be used as an anchor in a link even in older browsers that don't understand id attributes. This is special handling that only works with \h1 through \h6, not with other commands.

\number[<text>]

<text> is formatted as an item in a numbered list, like <li> inside <ol> in HTML. As with \bullet and \desc, the surrounding tags are inferred automatically. As with \bullet, a class attribute of packed will omit the paragraph tags around <text> for better formatting with a list of short items. See the description under \bullet for more information.

\pre[<text>]

Insert <text> preformatted, preserving spacing and line breaks. This uses the HTML <pre> tag, and therefore is normally also shown in a fixed-width font by the browser.

When using \pre inside indented blocks or lists, it's worth bearing in mind how browsers show indentation with \pre. Normally, the browser indents text inside \pre relative to the enclosing block, so you should only put as much whitespace before each line in \pre as those lines should be indented relative to the enclosing text. However lynx, unfortunately, indents relative to the left margin, so it's difficult to use indentation that looks correct in both lynx and other browsers.

\quote[<text>][<author>][<work>]

Used for quotes at the top of a web page. The whole text will be enclosed in a <blockquote> tag with class quote for style sheets. <text> may be multiple paragraphs, and then a final paragraph will be added (with class attribution) containing the author, a comma, and the <work> inside <cite> tags. <work> can be omitted by passing an empty third argument. If \quote is given a class argument of broken, <text> will be treated as a series of lines and a line break (<br />) will be added to the end of each line.

\rss[<url>][<title>]

Indicates that this page has a corresponding RSS feed at the URL <url>. The title of the RSS feed (particularly important if a page has more than one feed) is given by <title>. The feed links are included in the page header output by \heading, so this command must be given before \heading to be effective.

\rule

A horizontal rule, <hr> in HTML.

\sitemap

Inserts an unordered list showing the structure of the whole site, provided that a .sitemap file was found at the root of the source directory and spin wasn't run as a filter or on a single file. If .sitemap wasn't found or if spin is running as a filter or on a single file, inserts nothing.

Be aware that spin doesn't know whether a file contains a \sitemap command and hence won't know to respin a file when the .sitemap file has changed. You will need touch the source file to force it to be respun.

\table[<options>][<body>]

Creates a table. The <options> text is added verbatim to the <table> tag in the generated HTML, so it can be used to set various HTML attributes like cellpadding that aren't easily accessible in a portable fashion from style sheets. <body> is the body of the table, which should generally consist exclusively of \tablehead and \tablerow commands.

The descriptions are somewhat hard to read, so here's a sample table:

    \table[rules="cols" borders="1"][
        \tablehead [Older Versions]     [Webauth v3]
        \tablerow  [suauthSidentSrvtab] [WebAuthKeytab]
        \tablerow  [suauthFailAction]   [WebAuthLoginURL]
        \tablerow  [suauthDebug]        [WebAuthDebug]
        \tablerow  [suauthProxyHeader]  [(use mod_headers)]
    ]

The table support is currently preliminary. I've not yet found a good way of expressing tables, and it's possible that the syntax will change later.

\tablehead[<cell>][<cell>] ...

A heading row in a table. \tablehead takes any number of <cell> arguments, wraps them all in a <tr> table row tag, and puts each cell inside <th>. If a cell should have a certain class attribute, the easiest way to do that is to use a \class command around the <cell> text, and the class attribute will be "lifted" up to become an attribute of the enclosing <th> tag.

\tablerow[<cell>][<cell>] ...

A regular row in a table. \tablerow takes any number of <cell> arguments, wraps them all in a <tr> table row tag, and puts each cell inside <td>. If a cell should have a certain class attribute, the easiest way to do that is to use a \class command around the <cell> text, and the class attribute will be "lifted" up to become an attribute of the enclosing <th> tag.

Inline Commands

Inline commands can be used in the middle of a paragraph intermixed with other text. Most of them are simple analogs to their HTML counterparts. All of the following take a single argument (the enclosed text) and map to simple HTML tags:

    \bold       <b></b>                 (usually use \strong)
    \cite       <cite></cite>
    \code       <code></code>
    \emph       <em></em>
    \italic     <i></i>                 (usually use \emph)
    \strike     <strike></strike>       (should use styles)
    \strong     <strong></strong>
    \sub        <sub></sub>
    \sup        <sup></sup>
    \under      <u></u>                 (should use styles)

Here are the other inline commands:

\break

A forced line break, <br> in HTML.

\class[<text>]

Does nothing except wrap <text> in an HTML <span> tag. The only purpose of this command is to use it with a class argument that can be used in a style sheet. For example, you might write:

    \class(red)[A style sheet can make this text red.]

so that the style sheet can then refer to class red and change its color.

\entity[<code>]

An HTML entity with code <code>. Basically, becomes &<code>; in the generated HTML, or &#<code>; if <code> is entirely numeric. About the only time you'd need to use this is for non-ASCII characters (European names, for example) or if you need a literal [ or ] that isn't balanced.

\image[<url>][<text>]

Insert an inline image. <text> is the alt text for the image (which will be displayed on non-graphical browsers). Height and width tags are added automatically assuming that <url> is a relative URL in the same tree of files as the thread source.

\link[<url>][<text>]

Create a link to <url> with link text <text>. Basically <a href=""></a>.

\release[<product>]

Replaced with the date portion of the version information for <product>, taken from the .versions file at the top of the source tree. The date will be returned in the UTC time zone, not the local time zone.

\size[<file>]

Replaced with the size of <file> in B, KB, MB, GB, or TB as is most appropriate, without decimal places. The next largest unit is used if the value is larger than 1024. 1024 is used as the scaling factor, not 1000.

\version[<product>]

Replaced with the version number for <product>, taken from the .versions file at the top of the source tree.

Defining New Macros

One of the important things that thread supports over HTML is the ability to define new macros on the fly. If there are particular constructs that are frequently used on the page, you can define a macro at the top of that page and then just use it repeatedly throughout the page.

A string can be defined with the command:

    \=[<string>][<value>]

where <string> is the name that will be used (can only be alphanumerics plus underscore) and <value> is the value that string will expand into. Any later occurrance of \=<string> in the file will be replaced with <value>. For example:

    \=[HOME][http://www.stanford.edu/]

will cause any later occurrences of \=HOME in the file to be replaced with the text http://www.stanford.edu/. This can be useful for things like URLs for links, so that all the URLs can be collected at the top of the page for easy updating.

A new macro can be defined with the command:

    \==[<name>][<arguments>][<definition>]

where <name> is the name of the macro (again consisting only of alphanumerics or underscore), <arguments> is the number of arguments that it takes, and <definition> is the definition of the macro. When the macro is expanded, any occurrence of \1 in the definition is replaced with the first argument, any occurrence of \2 with the second argument, and so forth.

For example:

    \==[bolddesc] [2] [\desc[\bold[\1]][\2]]

defines a new macro \bolddesc that takes the same arguments as the regular \desc command but always wraps the first argument, the heading, in <strong>.

BUGS

Currently, the style sheets for cl2xhtml, cvs2xhtml, faq2html, and pod2thread are hard-coded into this program to fit my web pages. This makes this program awkward for others to use, since the style sheet has to be specified in every pointer file if they're using different names.

There is no way to configure how navigation links are added if the sitemap support is used.

\include needs some work to make it behave as expected without requiring that each \include be in its own paragraph. It should be possible to support \heading and \signature in included files without breaking the navigation link support.

\sitemap can only be used at the top of the web site or the links would be wrong. It needs to do relative adjustment of the links.

The sitemap support currently only adds previous, next, up, and top links in the header of the generated web page. Most browsers that support this functionality also support first and last links, and the information is available in the sitemap file to generate those. They should also be included.

SEE ALSO

cl2xhtml(1), cvs2xhtml(1), faq2html(1), pod2thread(1), spin-rss(1)

The XHTML 1.0 standard at <http://www.w3.org/TR/xhtml1/>.

Current versions of this program are available from my web tools page at <http://www.eyrie.org/~eagle/software/web/>, as are copies of all of the above-mentioned programs.

AUTHOR

Russ Allbery <rra@stanford.edu>

COPYRIGHT AND LICENSE

Copyright 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Russ Allbery <rra@stanford.edu>.

This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

Last spun 2013-07-01 from POD modified 2013-01-04