\def\CMR#1{{\fontfamily{cmr}\fontencoding{OT1}\selectfont#1}} \iffalse Dear Sebastian 29 May 1994 Here is an article for Baskerville. To make your life easier, why don't I promise to add or substract material so that it occupies exactly two pages. I need to send you material regarding consultants list. As I am responsible for any errors of fact or exposition, if you need to edit it for style or content, I would like that you send the revised version to me for approval, and if possible consult with me before making changes. with best regards Jonathan \fi \title{Backslash---Expansion of macros and so forth} \author[Jonathan Fine]{Jonathan Fine\\\texttt{J.Fine@uk.ac.cam.pmms}} \begin{Article} \noindent It is usual, in programming languages which admit compilation (such as {\it C}, BASIC and Pascal) for there to be a rigid and inviolable separation between code and data. It is possible for an interpreted BASIC program to write a program source file which is then loaded and run, but such is rather bad form. The same separation generally applies to Smalltalk, which is probably the most sophisticated of the interpreted languages. (My knowledge of LISP is limited. May its supporters please note that my endorsement of Smalltalk is, for the purposes of this column, a personal opinion only). \TeX, however, has no inbuilt distinction between code and data. As far as it is concerned, all is just one long sequence of varying types of tokens. This will be made clearer later. It is not as if there is one stream from which instructions are drawn, and another from which data is drawn. It is usual for compiled programming languages to have a ``\verb"GOTO"'' mechanism (usually implicit within loop and conditional constructs, and also subroutine and function calls) that allows forward and backward jumps within the code stream, which is in fact more like a heap of tiny sequences of instructions linked by random access pointers. Why am I saying all this? Most beginners expect \TeX\ to behave like other programming languages. Up to a point it does, particularly if all one wishes to do is write a simple replacement text macro, or set the values of some registers or parameters. But when it come to reading data from within a macro it definitely does not, and here beginners generally become unstuck, in the sense of losing their grip and running off the rails. %% deletable In another sense, of course, they become stuck. You pays your money, you takes your choice. %% I know that I had these problems six years ago when I started with \TeX. While the {\em \TeX book} explained to me how \TeX\ behaved, it did not give examples to clearly dispel my wrong prejudices. %% deletable (If you have prejudices or habits, may they be beneficial.) %% Hence this article. Most people have some experience of writing a program, even if only a humble batch file for use with MS-DOS. It is a simplification, which does no harm for the purpose of this article, to imagine the input stream to \TeX\ being one enormous long list of tokens. Change of category codes, \verb"\input" and \verb"\endinput" commands, and also the \verb"\openin" and \verb"\read" commands do not fundamentally alter this point of view. If a format file or some macros have previously been loaded (and such usually has been) then some of these tokens will be macros (or more exactly will have macro meaning when executed) and will thus influence the subsequent operation of \TeX. It is now time to announce the fundamental law on the expansion of \TeX\ macros. Suppose a \TeX\ macro in the input stream (usually but not necessarily at the very head of the stream) is expanded. The effect of this expansion is to alter or edit the input stream, in a very specific manner. This is explained on [203] (this means page~203 of {\em The \TeX book}). Once the parameter text, if any, has been read, and the replacement text, if any, has been put in its place, the expansion of the macro is at an end. It is done, over, finished, and no more. However, for the purposes of error reporting \TeX\ keeps a note of how the replacement text came to arise. We will see the use of this later. This information however in no way affects subsequent error-free execution. As far as \TeX\ is concerned, it is just as if it had been presented at this stage with the given amended input stream. Processing by \TeX\ now continue with the current state and the new stream of tokens. Here is an example. Plain \TeX\ defines \begin{verbatim} \def\centerline #1{\line{\hss #1\hss}} \end{verbatim} and so the expansion of \begin{verbatim} \centerline{} \end{verbatim} is \begin{verbatim} \line{\hss <Title>\hss} \end{verbatim} and that's it. This is the end of the expansion of the \verb"\centerline" macro. It so happens that \verb"\line" is also a macro \begin{verbatim} \def\line{\hbox to \hsize} \end{verbatim} and so we obtain \begin{verbatim} \hbox to \hsize{\hss <Title>\hss} \end{verbatim} as a subsequent stage from the \verb"\centerline" command. The token \verb"\hbox" refers to a primitive \TeX\ command, which is now executed. Note that if there were control sequences in the \verb"<Title>", then they will not be executed until \TeX\ is processing the contents of the \verb"\hbox". If there is a misspelt control sequence with the \verb"<Title>", \TeX\ will produce one of its famous multiline error messages, saying that within the expansion of \verb"\centerline" there was an expansion of \verb"\line", within which there was an expansion of the misspelt control sequence. But because misspelt and thus, presumably unknown, the expansion is to produce an error message. Knuth has new users run through precisely this situation [33]. Did you follow his advice and typeset the story about R.~J. Drofnats? I confess that I did not. The expansion of a macro results in a change in the input stream of tokens. Let us use the word `performance' to mean the end and final result of the expansion and execution of the macro and the tokens contained within, and perhaps their performance also. The expansion of \verb"\centerline" is as above. The execution is to set text in a horizontal box of width \verb"\hsize" and centered. Beginners may be frightened by the line of code \begin{verbatim} \setbox 0=\centerline{Title} \end{verbatim} but experts will know that this is in fact legitimate, and for why. Let us now move on to loops. I know that such things are avoided by all except those with tendencies to ovine larceny %% deletable (I'm struggling to fill the white space at the end of the article) %% but just suppose we wish to read a sequence of letters and---oh horror---put a small space between each and the next. There are many ways to do this (letter space, not steal sheep). Without a context there is no right or wrong, although the more bizarre solutions are more amusing and instructive of human psychology than useful. Without further ado, let's have some examples. My favourite is admirable in its simplicity. Here it is. \begin{verbatim} \def \spaceit #1{#1\littlespace\spaceit} \end{verbatim} We assume that \verb"\littlespace" will produce a small space, say by a kern. Let's see it in operation. The performance of \begin{verbatim} \spaceit Baskerville \end{verbatim} begins with the expansion of \verb"\spaceit" \begin{verbatim} B\littlespace \spaceit askerville \end{verbatim} and then the \verb"B" and \verb"\littlespace" are performed (\ie typeset and added to the current horizontal list), leaving \begin{verbatim} \spaceit askerville \end{verbatim} which now proceeds as before. This is called ``tail recursion'' by computer scientists [219]. It is an elegant way of repeating a story (Groan). All things, even \verb"Baskerville", will come to an end. We need to find a way of persuading \verb"\spaceit" to stop. One way to do this is to space a sentinel and the end of \verb"Baskerville", for which \verb"\spaceit" can test with each iteration. I will show how to do this next month. Testing for the sentinel takes time. In some situations it is better to take a more active approach. Let us look at this. We want \begin{verbatim} \endspaceit \end{verbatim} to break the \verb"\spaceit" loop, so that \begin{verbatim} \spaceit Baskerville\endspaceit \end{verbatim} will insert all those \verb"\littlespace"s. The penultimate expansion of \verb"\spaceit" is \begin{verbatim} \spaceit e\endspaceit e \littlespace \spaceit \endspaceit \end{verbatim} and once the `\verb"e"' and the \verb"\littlespace" have been done we have \begin{verbatim} \spaceit \endspaceit \endspaceit \littlespace \spaceit \end{verbatim} and now we go for a dirty trick. With the definition \begin{verbatim} \def \endspaceit \littlespace \spaceit {} \end{verbatim} the expansion of the previous line is \begin{verbatim} % empty \end{verbatim} which is just what we want. There we are, a loop without use on any of the control primitives. (It is worth noting that the so called {\em expansion\/} of a macro might be {\em smaller\/} than its arguments, or even zero. Finally, solutions and exercises. \noindent {\bf Solution 3.} {\em Two tokens have the same meaning. When does the substitution of one for the other make a difference?} For definiteness suppose that we \begin{verbatim} \let \RELAX \relax \end{verbatim} and then replace some occurence of \verb"\relax" by \verb"\RELAX". I know that this example is unlikely, but it serves to express the solution to the problem. It will make a difference in the following situations. Firstly, \begin{verbatim} \string \relax \end{verbatim} and secondly any assignment such as \begin{verbatim} \let \relax \something \def \relax { ... } \end{verbatim} and finally \begin{verbatim} \def \macro { ... \relax ... } \end{verbatim} should an \verb"\if" or \verb"\meaning" be subsequently applied to \verb"\macro", and as far as I know, that's it. \noindent {\bf Solution 4.} {\em What operational difference is there between \begin{verbatim} \def\aaa{aaaaaaaa} \def\xyz{aaaaaaaa} \end{verbatim} and \begin{verbatim} \def\aaa{aaaaaaaa} \let\xyz\aaa \end{verbatim} if any at all\/} was the problem. Macros need memory for their storage, and [383] tells us how much. The second variant will require less main memory (and make for quicker \verb"\ifx" tests I presume) than the first. This is because the \verb"\let" command [206--7] sets the meaning of the first argument (\verb"\xyz") to be whatever the current meaning of the second (\verb"\aaa") is. \TeX\ stores meanings in its memory. The \verb"\let" command sets the meaning pointer for \verb"\xyz" to be equal to (and so point to the same meaning as) the meaning pointer for \verb"\aaa". Moreover, if the code above itself appears in a macro, this macro will require less storage {\em and\/} execute quicker when the second variant is used. \noindent {\bf Exercise 5.} This comes from the excellent {\em Around the Bend\/} puzzle column run by Michael Downes of the American Mathematical Society (email {\tt mkd@math.ams.org}). The problem is to write a macro which will trim the leading and trailing spaces from user supplied text, such as the parameter text to \verb"\centerline" or \verb"\section". \noindent {\bf Exercise 6.} When unexpandable commands are inserted between the letters of a word the kerning and ligatures are lost [19, Exercise 5.1]. Compare `WAW' to `W\/A\/W'. The second has had \verb"\relax" commands inserted between the letters. Clearly, high class letter spacing (should there be such a thing) will respect the kerning information in the original font. For ligatures it is not so clear, and certainly harder. The problem is to deal with this kerned letterspacing problem. And while you're at it, how do we deal with the trailing \verb"\littlespace" that \verb"\spaceit" will leave at the end of \verb"Baskerville". \end{Article} \endinput