Steve’s XML Literate Programming System

Why yet another literate programming system?

Well, first, there is really no penalty to inventing another literate programming system. It does not stop you from using the existing literate programming systems out there. And you see this literate programming system only when you want to study or develop a program that uses it; it is no more onerous than having to deal with a particular program’s build systems, documentation guidelines, etc.

The present literate programming system prides itself in not requiring that documents be “weaved”. Since literate programming is about writing programs as essays for human readers, it follows that there ought to be no conceptual difference between the essay document itself and the output document. The distinction was only necessary in TeX-based literate programming systems, for TeX documents cannot have code inside them, so the code had to be extracted out in a weaving step. On the other hand, XML documents can mix different namespaces with no problem.

The other literate programming systems, such as XML-Lit, xmLP, and XWeb, still retain the weaving step, because they are geared to output DocBook, which traditionally did not allow XML namespaces. But we had XML namespaces for many years now, and they are handled well by most XML tools, including XSLT stylesheets. How the output document should be formatted should be up to the output stylesheet — whether the output be to XHTML, XSL-FO, LaTeX, or any other scheme.

The advantage to omitting the weaving step is obvious: the literate programming system becomes smaller, less complex, and easier to use. The present literate programming system runs on one XSLT stylesheet that is only about 3500 bytes long. The XSLT stylesheet that translates this document to XHTML is about 4400 bytes long.

The present literate programming system has one additional feature from the previous mentioned systems: fragment pointers are URIs with fragment identifiers, rather than plain XML ID references. So a program written with this system can be split across multiple XML documents (in the style of HTML web pages), which can even be accessed across a network. You are not restricted to DocBook-style monolithic documents1.

Obtaining the sources

The sources to this system may be obtained at http://rdfcat.sourceforge.net/literate/.

The tangling XSLT stylesheet

As expected of any literate programming system, the program implementing the system is written with the system itself.

For the convenience of the exposition, from now on we assume the following XML namespaces are in effect:

PrefixNamespace URI
lithttp://rdfcat.sf.net/ns/literate
xslhttp://www.w3.org/1999/XSL/Transform
exslthttp://exslt.org/common

The XSLT stylesheet follows.

<xsl:stylesheet exclude-result-prefixes="lit exslt" extension-element-prefixes="exslt" version="1.0" xml:lang="en">

<xsl:key name="src" match="*[@lit:src or @lit:type]" use="1" />

<xsl:template match="/">
  <xsl:apply-templates select="key('src', 1)" />
</xsl:template>

Start tangling text document
Text tangling copy templates

Starting tangling XML document
XML tangling copy templates

Suppressing comments in tangled document
</xsl:stylesheet>

A literate programming document may use any set of elements. (For example, this literal programming document is written in XHTML 2.0.) Literate programming processing is indicated by attributes from the XML namespace lit.

Each element in the input XML document with the lit:src or lit:type attributes attached, is processed in turn.

The lit:src attribute specifies, using a URI, the location of the tangled program. It is mandatory except for the special case when only one XML document is to be output to the default output location.

The lit:type attribute takes on either the values xml or text, to indicate that the desired output format is to be XML or plain text, respectively. The default is text.

Tangling XML-based documents

This means that the tangled document is a XML document, for example, a XSLT stylesheet.

The following template processes XML-based fragments, as described previously.

<xsl:output method="xml" encoding="utf-8" />

<xsl:template match="*[@lit:src and @lit:type='xml']" priority="2.0">
  <xsl:variable name="encoding">
    <xsl:choose>
      <xsl:when test="@lit:encoding"><xsl:value-of select="@lit:encoding" /></xsl:when>
      <xsl:otherwise>utf-8</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>

  <exslt:document method="xml" encoding="{$encoding}" href="{@lit:src}">
    <xsl:apply-templates mode="tangle-xml" />
  </exslt:document>
</xsl:template>

<xsl:template match="*[@lit:type='xml']">
  <xsl:apply-templates mode="tangle-xml" />
</xsl:template>
<xsl:template match="@*|node()" mode="tangle-xml">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()" mode="tangle-xml" />
  </xsl:copy>
</xsl:template>

<xsl:template match="text()" mode="tangle-xml">
  <xsl:value-of select="." />
</xsl:template>

<xsl:template match="*[@lit:href]" mode="tangle-xml">
  Load in the fragment
  <xsl:apply-templates select="$target/node()" mode="tangle-xml" />
</xsl:template>

Tangling text-based documents

The following template processes plain-text-based fragments, as described previously.

<xsl:template match="*[@lit:src]">
  <xsl:variable name="encoding">
    <xsl:choose>
      <xsl:when test="@lit:encoding"><xsl:value-of select="@lit:encoding" /></xsl:when>
      <xsl:otherwise>utf-8</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>

  <exslt:document method="text" encoding="{$encoding}" href="{@lit:src}">
    <xsl:apply-templates mode="tangle-text" />
  </exslt:document>
</xsl:template>
<xsl:template match="*" mode="tangle-text">
  <xsl:apply-templates mode="tangle-text" />
</xsl:template>

<xsl:template match="text()" mode="tangle-text">
  <xsl:value-of select="." />
</xsl:template>

<xsl:template match="*[@lit:href]" mode="tangle-text">
  Load in the fragment
    
  <xsl:apply-templates select="$target/node()" mode="tangle-text" />
</xsl:template>

Accessing fragment pointers

A fragment (element processed by the literate programming system) may include another fragment through an element with a lit:href attribute. The value of this attribute is a URI that points to the target fragment to be included.

The target fragment must be an element with a lit:frag attribute. This restriction exists to allow some optimization: any elements without this attribute (or lit:src or lit:type) may be ignored by the literate programming system and not needlessly kept in memory during processing.

The most common type fragment pointer (URI for lit:frag) is the ID reference of the form #name. Note that name must be formally declared as an ID in the DTD of the source XML document; having an attribute with the name id without declaring it formally as an ID in the DTD does not suffice.

If you use the the element type elem for your fragments, then the two conditions just discussed can be automatically satisfied if you put an attribute list declaration like:

<!ATTLIST elem id       ID    #IMPLIED
               lit:frag CDATA "" >
as part of the internal DTD of the XML source document.

The following code checks that the URI specified for lit:href can be accessed, and that the target element has the lit:frag attribute. If not, error messages are printed.

  <xsl:variable name="target" select="document(@lit:href, .)" />
  <xsl:if test="count($target)=0">
    <xsl:message terminate="yes">
      <xsl:text>error: fragment </xsl:text>
      <xsl:value-of select="@lit:href" /> 
      <xsl:text> not found or cannot be accessed</xsl:text>
    </xsl:message>
  </xsl:if>
  <xsl:if test="count($target/@lit:frag)=0">
    <xsl:message>
      <xsl:text>warning: encountered fragment pointer to </xsl:text>
      <xsl:value-of select="@lit:href" /> 
      <xsl:text> which does not have lit:frag attribute</xsl:text>
    </xsl:message>
  </xsl:if>

Comments

Formatted text, to be suppressed in the tangled document, may also be put inside fragments. This is most often used to make comments on specific lines of code, without having to start a new fragment for each block of commented code.

Such formatted text is indicated by a lit:comment attribute.

<xsl:template match="*[@lit:comment]" mode="tangle-text">
</xsl:template>

<xsl:template match="*[@lit:comment]" mode="tangle-xml">
</xsl:template>

Notes

  1. In my discussion of the documentation format in RDFcat, I argue against using monolithic document types such as DocBook.

Steve Cheng <stevecheng@users.sourceforge.net>