<?xml version='1.0' encoding='utf-8' ?>
<!DOCTYPE html [
<!ENTITY exslt-ns "http://exslt.org/common">
<!ENTITY lit-ns   "http://rdfcat.sf.net/ns/literate">
<!ENTITY xsl-ns   "http://www.w3.org/1999/XSL/Transform">
<!ATTLIST blockcode id ID #IMPLIED
                    lit:frag CDATA ""
                    xml:space (preserve) #FIXED 'preserve'>
]>
<?xml-stylesheet type="text/xml" href="to-xhtml1.xsl" ?>

<html xmlns="http://www.w3.org/200206/xhtml2/"
      xmlns:fn="http://rdfcat.sf.net/ns/xhtml-footnotes/"
      xmlns:lit="&lit-ns;"
      xmlns:xsl="&xsl-ns;"
      xmlns:exslt="&exslt-ns;">

<head>
<title>Steve’s XML Literate Programming system</title>
<meta property="vc:date">$LastChangedDate: 2006-03-09 18:39:59 -0500 (四, 09  3月 2006) $</meta>
<meta property="vc:author">$Author: stevecheng $</meta>
<meta property="vc:rev">$Rev: 26 $</meta>
</head>

<body>
<h1>Steve’s XML Literate Programming System</h1>

<h2>Why yet another literate programming system?</h2>

<p>
Well, first, there is really no penalty to inventing another literate
programming system.  It does not stop you from using the existing
literate programming systems out there.  And you see
this literate programming system only when you
want to study or develop a program that uses it; 
it is no more onerous
than having to deal with a particular program’s build systems, 
documentation guidelines, etc.
</p>

<p>
The present literate programming system prides itself in not
requiring that documents be “weaved”.
Since literate programming is about writing programs
as essays for human readers, it follows that there ought
to be no conceptual difference between the essay document itself
and the output document.  The distinction
was only necessary in TeX-based literate programming systems,
for TeX documents cannot have code inside them,
so the code had to be extracted out in a weaving step.
On the other hand, XML documents can mix different namespaces
with no problem.  </p>

<p>
The other literate programming systems, such as
<a href="http://xml-lit.sourceforge.net/">XML-Lit</a>,
<a href="http://xmlp.sourceforge.net/">xmLP</a>,
and <a href="http://nwalsh.com/docs/articles/xml2002/lp/paper.html">XWeb</a>,
still retain the weaving step, because they are geared to output DocBook,
which traditionally did not allow XML namespaces.
But we had XML namespaces for many years now, and they are handled well
by most XML tools, including XSLT stylesheets.
How the output document should be formatted
should be up to the output stylesheet — whether the output be
to XHTML, XSL-FO, LaTeX, or any other scheme.
</p>

<p>
The advantage to omitting the weaving step is obvious:
the literate programming system becomes smaller, less complex,
and easier to use.
The present literate programming system runs on
one XSLT stylesheet that is only about 3500 bytes long.
The XSLT stylesheet that translates this document to XHTML
is about 4400 bytes long.
</p>

<p>
The present literate programming system
has one additional feature from the previous mentioned systems:
fragment pointers are URIs with fragment identifiers, 
rather than plain XML ID references.
So a program written with this system can be 
split across multiple XML documents (in the style of HTML web pages),
which can even be accessed across a network.
You are not restricted to DocBook-style monolithic documents<fn:fn>In my <a href="http://rdfcat.sourceforge.net/doc-format">discussion of the documentation format in RDFcat</a>, I argue against using monolithic document types such
as DocBook.</fn:fn>.
</p>

<h2>Obtaining the sources</h2>

<p>
The sources to this system may be obtained at
<a href="http://rdfcat.sourceforge.net/literate/">http://rdfcat.sourceforge.net/literate/</a>.
</p>

<h2>The tangling XSLT stylesheet</h2>

<p>
As expected of any literate programming system,
the program implementing the system is written
with the system itself.  
</p>

<p>
For the convenience of the exposition,
from now on we assume the following XML namespaces are in effect:
</p>

<table>
<thead>
<tr><th>Prefix</th><th>Namespace URI</th></tr>
</thead>
<tbody>
<tr><td><samp>lit</samp></td><td>&lit-ns;</td></tr>
<tr><td><samp>xsl</samp></td><td>&xsl-ns;</td></tr>
<tr><td><samp>exslt</samp></td><td>&exslt-ns;</td></tr>
</tbody>
</table>

<p>
The XSLT stylesheet follows.  
</p>

<blockcode lit:type="xml"
><xsl:stylesheet 
  xmlns:xsl="&xsl-ns;"
  xmlns:lit="&lit-ns;"
  xmlns:exslt="&exslt-ns;"
  exclude-result-prefixes="lit exslt"
  extension-element-prefixes="exslt"
  version='1.0'
  xml:lang="en">

<xsl:key name="src" match="*[@lit:src or @lit:type]" use="1" />

<xsl:template match="/">
  <xsl:apply-templates select="key('src', 1)" />
</xsl:template>

<a href="#tangle-text-start" lit:href="#tangle-text-start">Start tangling text document</a>
<a href="#tangle-text" lit:href="#tangle-text">Text tangling copy templates</a>

<a href="#tangle-xml-start" lit:href="#tangle-xml-start">Starting tangling XML document</a>
<a href="#tangle-xml" lit:href="#tangle-xml">XML tangling copy templates</a>

<a href="#suppress-comments" lit:href="#suppress-comments">Suppressing comments in tangled document</a>
</xsl:stylesheet>
</blockcode>

<p>
A literate programming document may use
any set of elements.
(For example, this literal programming document is written in XHTML 2.0.)
Literate programming processing is indicated by attributes from
the XML namespace <samp>lit</samp>.
</p>

<p>
Each element in the input XML document
with the <samp>lit:src</samp> or <samp>lit:type</samp>
attributes attached, is processed in turn.
</p>

<p>
The <samp>lit:src</samp> attribute specifies,
using a URI, the location of the tangled
program.  It is mandatory except for the special
case when only one XML document is to be output to the
default output location.
</p>

<p>
The <samp>lit:type</samp> attribute takes on either
the values <samp>xml</samp> or <samp>text</samp>,
to indicate that the desired output format is to be XML or plain text,
respectively.  The default is <samp>text</samp>.
</p>



<h3>Tangling XML-based documents</h3>

<p>
This means that the tangled document is a XML document,
for example, a XSLT stylesheet.
</p>

<p>
The following template processes XML-based fragments,
as described previously.
</p>

<blockcode id="tangle-xml-start"
><xsl:output method="xml" encoding="utf-8" />

<xsl:template match="*[@lit:src and @lit:type='xml']" priority="2.0">
  <xsl:variable name="encoding">
    <xsl:choose>
      <xsl:when test="@lit:encoding"><xsl:value-of select="@lit:encoding" /></xsl:when>
      <xsl:otherwise>utf-8</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>

  <exslt:document method="xml"
                  encoding="{$encoding}"
                  href="{@lit:src}">
    <xsl:apply-templates mode="tangle-xml" />
  </exslt:document>
</xsl:template>

<xsl:template match="*[@lit:type='xml']">
  <xsl:apply-templates mode="tangle-xml" />
</xsl:template>
</blockcode>

<blockcode id="tangle-xml"
><xsl:template match="@*|node()" mode="tangle-xml">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()" mode="tangle-xml" />
  </xsl:copy>
</xsl:template>

<xsl:template match="text()" mode="tangle-xml">
  <xsl:value-of select="." />
</xsl:template>

<xsl:template match="*[@lit:href]" mode="tangle-xml">
  <a href="#tangle-get-fragment" lit:href="#tangle-get-fragment">Load in the fragment</a>
  <xsl:apply-templates select="$target/node()" mode="tangle-xml" />
</xsl:template>
</blockcode>

<h3>Tangling text-based documents</h3>

<p>
The following template processes plain-text-based fragments,
as described previously.
</p>

<blockcode id="tangle-text-start" 
><xsl:template match="*[@lit:src]">
  <xsl:variable name="encoding">
    <xsl:choose>
      <xsl:when test="@lit:encoding"><xsl:value-of select="@lit:encoding" /></xsl:when>
      <xsl:otherwise>utf-8</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>

  <exslt:document method="text"
                  encoding="{$encoding}"
                  href="{@lit:src}">
    <xsl:apply-templates mode="tangle-text" />
  </exslt:document>
</xsl:template>
</blockcode>

<blockcode id="tangle-text"
><xsl:template match="*" mode="tangle-text">
  <xsl:apply-templates mode="tangle-text" />
</xsl:template>

<xsl:template match="text()" mode="tangle-text">
  <xsl:value-of select="." />
</xsl:template>

<xsl:template match="*[@lit:href]" mode="tangle-text">
  <a href="#tangle-get-fragment" lit:href="#tangle-get-fragment">Load in the fragment</a>
    
  <xsl:apply-templates select="$target/node()" mode="tangle-text" />
</xsl:template>
</blockcode>

<h3>Accessing fragment pointers</h3>

<p>
A fragment (element processed by the literate programming system) 
may include another fragment through an element
with a <samp>lit:href</samp> attribute.
The value of this attribute is a URI that points
to the target fragment to be included.
</p>
   
<p>
The target fragment must be an element
with a <samp>lit:frag</samp> attribute.
This restriction exists to allow some optimization:
any elements without this attribute (or <samp>lit:src</samp>
or <samp>lit:type</samp>) may be ignored by the literate 
programming system and not needlessly kept in memory during processing.
</p>

<p>
The most common type fragment pointer (URI for <samp>lit:frag</samp>)
is the ID reference of the form <samp>#<var>name</var></samp>.
Note that <var>name</var> must be formally declared as an ID
in the DTD of the source XML document; having an attribute with the name
<samp>id</samp> without declaring it formally as an ID in the DTD
does <em>not</em> suffice.
</p>

<p>
If you use the the element type
<samp><var>elem</var></samp> for your fragments, 
then the two conditions just discussed
can be automatically satisfied if you put an attribute list declaration 
like:

<pre
>&lt;!ATTLIST <var>elem</var> id       ID    #IMPLIED
               lit:frag CDATA "" &gt;
</pre>

as part of the internal DTD of the XML source document.
</p>

<p>
The following code checks that the URI specified
for <samp>lit:href</samp> can be accessed,
and that the target element has the <samp>lit:frag</samp> attribute.
If not, error messages are printed.
</p>

<blockcode id="tangle-get-fragment" 
>  <xsl:variable name="target" select="document(@lit:href, .)" />
  <xsl:if test="count($target)=0">
    <xsl:message terminate="yes">
      <xsl:text>error: fragment </xsl:text>
      <xsl:value-of select="@lit:href" /> 
      <xsl:text> not found or cannot be accessed</xsl:text>
    </xsl:message>
  </xsl:if>
  <xsl:if test="count($target/@lit:frag)=0">
    <xsl:message>
      <xsl:text>warning: encountered fragment pointer to </xsl:text>
      <xsl:value-of select="@lit:href" /> 
      <xsl:text> which does not have lit:frag attribute</xsl:text>
    </xsl:message>
  </xsl:if>
</blockcode>

<h3>Comments</h3>

<p>
Formatted text, to be suppressed in the tangled document, 
may also be put inside fragments.
This is most often used 
to make comments on specific lines of code,
without having to start a new fragment for each block
of commented code.
</p>

<p>
Such formatted text is indicated by a <samp>lit:comment</samp>
attribute.
</p>

<blockcode id="suppress-comments"
><xsl:template match="*[@lit:comment]" mode="tangle-text">
</xsl:template>

<xsl:template match="*[@lit:comment]" mode="tangle-xml">
</xsl:template>
</blockcode>

</body>
</html>
