% $Id: tfm.tex,v 3.1.1.1 1991/08/08 16:04:39 schrod Released schrod $
%------------------------------------------------------------
% taken from TFtoPL 3.1

%
% definition of TFM format
% LaTeX markup
%


% $Log: tfm.tex,v $
% Revision 3.1.1.1  1991/08/08  16:04:39  schrod
% CHANGES BY DON HOSEK:
%  -- Inserted \subsection's.
%  -- Deleted WEB defines.
%  -- `e.g.' now in italics, to be consistent with the rest of the
%     standard.
%
% CHANGES BY JOACHIM SCHROD:
%  -- Changed \bigbreak between WEB sections to \medbreak.
%  -- Added + signs to length specifications in \cmd tags, to show that
%     the param is signed.
%  -- Make formulas look more `math-like' and less `Pascal-like.'
%  -- Reformatted table of TFM header to fit in a TUGboat column. The
%     same with the definition of the |lf| entry.
%
% Revision 3.1  1990/11/15  17:51:05  schrod
% corrected comment which explains where this text comes from.
% (TFtoPL 3.1 instead of VFtoVP 1.0). Updated version number accordingly.
%
% Revision 1.0.1.1  90/07/16  00:00:00  schrod
% appended \endinput.
% 
% Revision 1.0  90/07/04  00:00:00  schrod
% Initial revision
% 


\section{Font metric data}
\label{tfm-format}

\subsection{Introduction}

The idea behind \str{TFM} files is that typesetting routines like
\TeX\ need a compact way to store the relevant information about
several dozen fonts, and computer centers need a compact way to store
the relevant information about several hundred fonts. \str{TFM} files
are compact, and most of the information they contain is highly
relevant, so they provide a solution to the problem.

The information in a \str{TFM} file appears in a sequence of 8-bit
bytes. Since the number of bytes is always a multiple of 4, we could
also regard the file as a sequence of 32-bit words; but \TeX\ uses
the byte interpretation, and so do we. Note that the bytes are
considered to be unsigned numbers.


\subsection{Summary of {\tt TFM} files}

\subsubsection{The header}

The first 24 bytes (6 words) of a \str{TFM} file contain twelve
16-bit integers that give the lengths of the various subsequent
portions of the file. These twelve integers are, in order:
 %
 \begin{center}
 \begin{tabular}{r@{${}={}$}l}
   \id{lf} & length of the entire file, in words;\\
   \id{lh} & length of the header data, in words;\\
   \id{bc} & smallest character code in the font;\\
   \id{ec} & largest character code in the font;\\
   \id{nw} & number of words in the width table;\\
   \id{nh} & number of words in the height table;\\
   \id{nd} & number of words in the depth table;\\
   \id{ni} & number of words in the\\
             \multicolumn2{r}{italic correction table;}\\
   \id{nl} & number of words in the lig/kern table;\\
   \id{nk} & number of words in the kern table;\\
   \id{ne} & number of words in the\\
             \multicolumn2{r}{extensible character table;}\\
   \id{np} & number of font parameter words.\\
\end{tabular}
\end{center}
%
 They are all nonnegative and less than $2^{15}$. We must have
$\id{bc}-1\le \id{ec}\le 255$, $\id{ne}\le 256$, and
$$
   \displaylines{
      \quad \id{lf} = 6+\id{lh}+(\id{ec}-\id{bc}+1)+\id{nw}+\id{nh}\hfill\cr
\noalign{\nobreak}
      \hfill {} +\id{nd}+\id{ni}+\id{nl}+\id{nk}+\id{ne}+\id{np}.\quad\cr
      }
$$
 Note that a font may contain as many as 256 characters (if
$\id{bc}=0$ and $\id{ec}=255$), and as few as 0 characters (if
$\id{bc}=\id{ec}+1$).

Incidentally, when two or more 8-bit bytes are combined to form an
integer of 16 or more bits, the most significant bytes appear first
in the file. This is called BigEndian order.


\subsubsection{{\tt TFM} data}

The rest of the \str{TFM} file may be regarded as a sequence of ten
data arrays having the informal specification
$$
   \def\arr$[#1]#2${$\colon \res{array}\ [#1]$ \res{of} #2}
\vbox{\ialign{\hfil\id{#}& \arr#\hfil\cr
   header&$[0\to\id{lh}-1]\id{stuff}$\cr
   char\_info&$[\id{bc}\to\id{ec}]\id{char\_info\_word}$\cr
   width&$[0\to\id{nw}-1]\id{fix\_word}$\cr
   height&$[0\to\id{nh}-1]\id{fix\_word}$\cr
   depth&$[0\to\id{nd}-1]\id{fix\_word}$\cr
   italic&$[0\to\id{ni}-1]\id{fix\_word}$\cr
   lig\_kern&$[0\to\id{nl}-1]\id{lig\_kern\_command}$\cr
   kern&$[0\to\id{nk}-1]\id{fix\_word}$\cr
   exten&$[0\to\id{ne}-1]\id{extensible\_recipe}$\cr
   param&$[1\to\id{np}]\id{fix\_word}$\cr
}}
$$
 The most important data type used here is a \id{fix\_word}, which is
a 32-bit representation of a binary fraction. A \id{fix\_word} is a
signed quantity, with the two's complement of the entire word used to
represent negation. Of the 32 bits in a \id{fix\_word}, exactly 12
are to the left of the binary point; thus, the largest \id{fix\_word}
value is $2048-2^{-20}$, and the smallest is $-2048$. We will see
below, however, that all but one of the \id{fix\_word} values will
lie between $-16$ and $+16$.

\medbreak

The first data array is a block of header information, which contains
general facts about the font. The header must contain at least two
words, and for \str{TFM} files to be used with Xerox printing
software it must contain at least 18 words, allocated as described
below. When different kinds of devices need to be interfaced, it may
be necessary to add further words to the header block.

\begin{description}

\item[{$\id{header}[0]$}] is a 32-bit check sum that \TeX\ will
copy into the \str{DVI} output file whenever it uses the font.  Later
on when the \str{DVI} file is printed, possibly on another computer,
the actual font that gets used is supposed to have a check sum that
agrees with the one in the \str{TFM} file used by \TeX. In this way,
users will be warned about potential incompatibilities. (However, if
the check sum is zero in either the font file or the \str{TFM} file,
no check is made.)  The actual relation between this check sum and
the rest of the \str{TFM} file is not important; the check sum is
simply an identification number with the property that incompatible
fonts almost always have distinct check sums.

\item[{$\id{header}[1]$}] is a \id{fix\_word} containing the design
size of the font, in units of \TeX\ points (7227 \TeX\ points = 254
cm).  This number must be at least 1.0; it is fairly arbitrary, but
usually the design size is 10.0 for a ``10 point'' font, i.e., a font
that was designed to look best at a 10-point size, whatever that
really means. When a \TeX\ user asks for a font `\str{at} $\delta$
\str{pt}', the effect is to override the design size and replace it
by $\delta$, and to multiply the $x$ and~$y$ coordinates of the
points in the font image by a factor of $\delta$ divided by the
design size.  {\sl All other dimensions in the\/ \str{TFM} file are
\id{fix\_word}\kern-1pt\ numbers in design-size units.} Thus, for
example, the value of $\id{param}[6]$, one \str{em} or \str{\\quad},
is often the \id{fix\_word} value $2^{20}=1.0$, since many fonts have
a design size equal to one em. The other dimensions must be less than
16 design-size units in absolute value; thus, $\id{header}[1]$ and
$\id{param}[1]$ are the only \id{fix\_word} entries in the whole
\str{TFM} file whose first byte might be something besides 0 or 255.

\item[{$\id{header}[2\ldots11],$}] if present, contains 40 bytes that
identify the character coding scheme. The first byte, which must be
between 0 and 39, is the number of subsequent ASCII bytes actually
relevant in this string, which is intended to specify what
character-code-to-symbol convention is present in the font.  Examples
are \str{ASCII} for standard ASCII, \str{TeX text} for fonts like
\str{cmr10} and \str{cmti9}, \str{TeX math extension} for
\str{cmex10}, \str{XEROX text} for Xerox fonts, \str{GRAPHIC} for
special-purpose non-alphabetic fonts, \str{UNSPECIFIED} for the
default case when there is no information.  Parentheses should not
appear in this name. (Such a string is said to be in {\small BCPL}
format.)

\item[{$\id{header}[12\ldots16]$,}] if present, contains 20 bytes that
name the font family ({\it e.g.}, \str{CMR} or \str{HELVETICA}), in {\small
BCPL} format. This field is also known as the ``font identifier.''

\item[{$\id{header}[17]$,}] if present, contains a first byte
called the \id{seven\_bit\_safe\_flag}, then two bytes that are
ignored, and a fourth byte called the \id{face}. If the value of the
fourth byte is less than 18, it has the following interpretation as a
``weight, slope, and expansion'':  Add 0 or 2 or 4 (for medium or
bold or light) to 0 or 1 (for roman or italic) to 0 or 6 or 12 (for
regular or condensed or extended).  For example, 13 is 0+1+12, so it
represents medium italic extended.  A three-letter code ({\it e.g.},
\str{MIE}) can be used for such \id{face} data.

\item[{$\id{header}[18\ldots{\rm whatever}]$}] might also be present;
the individual words are simply called $\id{header}[18]$,
$\id{header}[19]$, etc., at the moment.

\end{description}

\medbreak

Next comes the \id{char\_info} array, which contains one
\id{char\_info\_word} per character. Each \id{char\_info\_word}
contains six fields packed into four bytes as follows.

\begin{description}

\item[first byte] \id{width\_index} (8 bits)

\item[second byte] \id{height\_index} (4 bits) times 16, plus
\id{depth\_index} (4~bits)

\item[third byte] \id{italic\_index} (6 bits) times 4, plus \id{tag}
(2~bits)

\item[fourth byte] \id{remainder} (8 bits)

\end{description}
% 
 The actual width of a character is $\id{width}[\id{width\_index}]$,
in design-size units; this is a device for compressing information,
since many characters have the same width. Since it is quite common
for many characters to have the same height, depth, or italic
correction, the \str{TFM} format imposes a limit of 16 different
heights, 16 different depths, and 64 different italic corrections.

Incidentally, the relation
$\id{width}[0]=\id{height}[0]=\id{depth}[0]=\id{italic}[0]=0$ should
always hold, so that an index of zero implies a value of zero. The
\id{width\_index} should never be zero unless the character does not
exist in the font, since a character is valid if and only if it lies
between \id{bc} and \id{ec} and has a nonzero \id{width\_index}.

\medbreak

The \id{tag} field in a \id{char\_info\_word} has four values that
explain how to interpret the \id{remainder} field.

\begin{description}

\item[$\id{tag}=0\ (\id{no\_tag})$] means that \id{remainder} is
unused.

\item[$\id{tag}=1\ (\id{lig\_tag})$] means that this character has a
ligature/kerning program starting at
$\id{lig\_kern}[\id{remainder}]$.

\item[$\id{tag}=2\ (\id{list\_tag})$] means that this character is
part of a chain of characters of ascending sizes, and not the largest
in the chain.  The \id{remainder} field gives the character code of
the next larger character.

\item[$\id{tag}=3\ (\id{ext\_tag})$] means that this character code
represents an extensible character, i.e., a character that is built
up of smaller pieces so that it can be made arbitrarily large. The
pieces are specified in $\id{exten}[\id{remainder}]$.

\end{description}

\medbreak

The \id{lig\_kern} array contains instructions in a simple
programming language that explains what to do for special letter
pairs. Each word is a \id{lig\_kern\_command} of four bytes.

\begin{description}

\item[first byte] \id{skip\_byte}, indicates that this is the
final program step if the byte is 128 or more, otherwise the next
step is obtained by skipping this number of intervening steps.

\item[second byte] \id{next\_char}: ``if \id{next\_char} follows the
current character, then perform the operation and stop, otherwise
continue.''

\item[third byte] \id{op\_byte}, indicates a ligature step if less
than~128, a kern step otherwise.

\item[fourth byte] \id{remainder}.

\end{description}
%
 In a kern step, an additional space equal to
$\id{kern}[256(\id{op\_byte}-128)+\id{remainder}]$ is inserted
between the current character and \id{next\_char}. This amount is
often negative, so that the characters are brought closer together by
kerning; but it might be positive.

There are eight kinds of ligature steps, having \id{op\_byte} codes
$4a+2b+c$ where $0\le a\le b+c$ and $0\le b,c\le 1$. The character whose
code is \id{remainder} is inserted between the current character and
\id{next\_char}; then the current character is deleted if $b=0$, and
\id{next\_char} is deleted if $c=0$; then we pass over $a$~characters
to reach the next current character (which may have a
ligature/kerning program of its own).

Notice that if $a=0$ and $b=1$, the current character is unchanged;
if $a=b$ and $c=1$, the current character is changed but the next
character is unchanged.

If the very first instruction of the \id{lig\_kern} array has
$\id{skip\_byte}=255$, the \id{next\_char} byte is the so-called
right boundary character of this font; the value of \id{next\_char}
need not lie between \id{bc} and~\id{ec}. If the very last
instruction of the \id{lig\_kern} array has $\id{skip\_byte}=255$,
there is a special ligature/kerning program for a left boundary
character, beginning at location
$256\id{op\_byte}+\id{remainder}$. The interpretation is that
\TeX\ puts implicit boundary characters before and after each
consecutive string of characters from the same font. These implicit
characters do not appear in the output, but they can affect ligatures
and kerning.

If the very first instruction of a character's \id{lig\_kern} program
has $\id{skip\_byte}>128$, the program actually begins in location
$256\id{op\_byte}+\id{remainder}$. This feature allows access to
large \id{lig\_kern} arrays, because the first instruction must
otherwise appear in a location $\le 255$.

Any instruction with $\id{skip\_byte}>128$ in the \id{lig\_kern}
array must have $256\id{op\_byte}+\id{remainder}<\id{nl}$. If
such an instruction is encountered during normal program execution,
it denotes an unconditional halt; no ligature command is performed.

\medbreak

Extensible characters are specified by an \id{extensible\_recipe},
which consists of four bytes called \id{top}, \id{mid}, \id{bot}, and
\id{rep} (in this order). These bytes are the character codes of
individual pieces used to build up a large symbol. If \id{top},
\id{mid}, or \id{bot} are zero, they are not present in the built-up
result. For example, an extensible vertical line is like an
extensible bracket, except that the top and bottom pieces are
missing.

\medbreak

\noindent The final portion of a \str{TFM} file is the \id{param}
array, which is another sequence of \id{fix\_word} values.

\begin{description}

\item[{$\id{param}[1]=\id{slant}$}] is the amount of italic slant,
which is used to help position accents. For example,
$\id{slant}=0.25$ means that when you go up one unit, you also go
0.25 units to the right. The \id{slant} is a pure number; it's the
only \id{fix\_word} other than the design size itself that is not
scaled by the design size.

\item[{$\id{param}[2]=\id{space}$}] is the normal spacing between words
in text. Note that character \str{"\ "} in the font need not have
anything to do with blank spaces.

\item[{$\id{param}[3]=\id{space\_stretch}$}] is the amount of glue
stretching between words.

\item[{$\id{param}[4]=\id{space\_shrink}$}] is the amount of glue
shrinking between words.

\item[{$\id{param}[5]=\id{x\_height}$}] is the height of letters for
which accents don't have to be raised or lowered.

\item[{$\id{param}[6]=\id{quad}$}] is the size of one em in the font.

\item[{$\id{param}[7]=\id{extra\_space}$}] is the amount added to
$\id{param}[2]$ at the ends of sentences.

\end{description}

When the character coding scheme is \str{TeX math symbols}, the font
is supposed to have 15 additional parameters called \id{num1},
\id{num2}, \id{num3}, \id{denom1}, \id{denom2}, \id{sup1}, \id{sup2},
\id{sup3}, \id{sub1}, \id{sub2}, \id{supdrop}, \id{subdrop},
\id{delim1}, \id{delim2}, and \id{axis\_height}, respectively. When
the character coding scheme is \str{TeX math extension}, the font is
supposed to have six additional parameters called
\id{default\_rule\_thickness} and \id{big\_op\_spacing1} through
\id{big\_op\_spacing5}.


\endinput