3. About this document

This program, xhtml2to1, is a literate program; that is, its “source” is an essay containing its computer “source code” annotated with English explanations for human readers.

3.1. Why literate programming?

We discuss why xhtml2to1 uses literate programming1.

3.1. Correctness of programs

Firstly, it is a way of producing documentation for programs, something sorely lacking in open source software (in general).

The author was also inclined to employ literate programming, because of his personal tastes: he is a student of mathematics and is used to reading and writing mathematical expositions. He is constantly dismayed at the bugginess of many computer programs. In contrast, almost all widely accepted mathematics have no errors. And mathematicians take errors in published work, if discovered, very seriously, unlike most computer programmers.

Why is it so hard to achieve correctness in computer programs? One reason is that computer programs are, by their nature, too formal. A computer follows precisely a set of instructions written in a formal syntax. We make one small typo or omission in a C program and it core dumps.

We want the computer to do what we mean it to do, not exactly what we say. Obviously we cannot forgo programming languages with formal syntax altogether, so the next best thing is to complement “code”2 written in a computer programming language with exposition in a natural language. Not very much unlike mathematical writings, where even mathematical proofs, which are expected to be formal, are written in English (or another natural language) with formulae interspersed throughout.

The author believes literate programming leads to more correct programs — after all, TeX, the first literate program, is mostly bug-free — and this alone should justify the method.

3.2. Design of programs

Literate programming also helps with the elegant design of programs. A logically correct program that has a baroque design, or is difficult to use, could be as bad as a buggy program. A person who wants to write a good program must have some confidence that the design of the program is good. And what could be more convincing than writing the program as an essay presents arguments for the program’s design (provided the program simultaneously works, of course)?

3.3. Is literate programming too hard?

There is no doubt that literate programming is harder, at least at first, than “normal programming”. The author recognizes that most literate programming systems out there are just too clumsy to use. That is why the author has written his own literate programming system to try to fix these problems.

In the case of xhtml2to1, since its primary implementation language is XSLT, an XML-based syntax, fragments the program can simply be embedded inside a XHTML 2.0 document — no extra wrappers are necessary. When developing xhtml2to1, the author tends to think through the design of the program in his head, meanwhile experimenting with the implementation in XSLT (that is: do write-compile-test cycles). When the author gets a part of the program working right and just needs to polish the results, at that time he takes the opportunity to add in the English explanations of what he has done, to put down concretely the vague notions in his mind and make sure he did not miss anything substantial.

So literate programming need not be much harder than developing the code directly!

3.2. xhtml2to1 is formatted with itself

The above arguments are not just some theoretical ruminations. It is precisely the fact that xhtml2to1 is written with literate programming, that we have the opportunity to use the xhtml2to1 stylesheets to format the xhtml2to1 document itself. So we automatically get the testing of both the design and implementation of xhtml2to1 with a non-trivial application.

For the reader interested in the technical workings of this program, almost every portion of the source code is presented nicely in this document with copious English explanations. The author hopes the reader will enjoy reading this document (the literate program) as much as the author has enjoyed writing it.

3.3. How to build the xhtml2to1 distribution

The build system requires scons.

For the moment, the build system of xhtml2to1 also assumes that the libxslt XSLT processor has been installed. If that is the case, just type scons on the command line to build everything. Otherwise, you will have to edit the SConstruct file to suit your XSLT processor.

After building, which should be fairly quick, there will be XSL files produced that can be used for formatting other documents, and HTML files produced for the document you are reading now.


  1. Pioneered by the famous Donald E. Knuth in his TeX typesetting system.
  2. The term “code” (or “coding”) itself is illustrative of the point expounded here. The term evokes images of cryptic, undecipherable gobledegook only used by eccentric computer geeks. If the author had his way in changing the existing language, he would suggest the terms “formula” or “recipe” instead of “code”, to refer to “instructions for a computer written in a formal language”.