{\nopagenumbers} % use plain format, no LaTeX! cmd line: tex enctex

%% My abbrevations:

\def\UTF-{\hbox{UTF-}}
\def\spacebk{$\langle${\it space\/}$\rangle$}

\input ofs [ffonts]                  % Charter je ve skupině free fonts

\setfonts [Charter/10pt]             % výchozí rodina
\setmath[//]                         % inicializace matematiky
\fontdef\tt [CMTypewriter/mag1.1]    % strojopis, korekce střední výšky
\fontdef\verbtt [CMTypewriter-rm/8]  % strojopis pro display ukázky
\fontdef\small [!/9]                 % zmenšení pro abstrakt a záhlaví
\addcmd \small {\baselineskip11pt \rm \def\mathversion{normal}\setmath[//]}
\fontdef\fontsekce [!/12]            % pro nadpisy sekcí
\addcmd \fontsekce {\bf \let\it=\bi \def\mathversion{bold}\setmath[//]} 
\fontdef\fonttitul [!-bf/14.4]       % titul

\def\starthead{\global\headline=
     {\small \ifodd\pageno \hfil \thetitul \headspace \the\pageno 
     \else \the\pageno \headspace \theauthor \hfil \fi}}
\headline={\hfil\starthead}
\def\headspace{\hskip2.5em\relax}
\def\makeheadline{\vbox to0pt{\vskip-25pt
  \line{\vbox to8.5pt{}\the\headline}\vss}\nointerlineskip}
\footline={\setfonts[/7]\setmath[//]\baselineskip=9pt
     \vbox{\hbox{\copytext}\hbox{\copykonvoj}}\hfil
     \global\footline={}}
\def\lastpage{\advance\firstpage by\numpages \advance\firstpage by-1
     \the\firstpage}

\def\copytext{Euro\TeX{} 2003}
\def\copykonvoj{}


%% PlainTeX macros. You can change it or omit it

\hsize=12,2cm
\vsize=19,3cm
\hoffset=63pt
\voffset=43pt
\parindent=14pt

\lineskiplimit=-10pt

\exhyphenpenalty=10000
\widowpenalty=10000
\clubpenalty=10000
\raggedbottom  

\newcount\subnum

\def\subtit #1\par{\advance\subnum by1
   \removelastskip %\goodbreak
   \vskip17pt plus2pt minus1pt\noindent{\fontsekce
   \the\subnum\enspace\enspace #1}%
   \par\nobreak\vskip11pt plus2pt minus1pt 
   \everypar{\setbox0=\lastbox \everypar={}}}
\def\reference {\subnum=-1 \kap Reference\par \small}
\def\bib #1 {\par\advance\subnum by1 \leftskip=\parindent 
   \noindent\llap{\expandafter \ifx \csname cit:#1\endcsname\relax
     ??\else\csname cit:#1\endcsname\fi.\enspace}\ignorespaces}

\def\titul #1 \par{\def\thetitul{#1}
   \centerline{\fonttitul #1}\vskip20pt\relax}
\def\author #1 \par{\def\theauthor{#1}
   \centerline{#1}\vskip10pt\relax}
\def\institut #1 \par{\centerline{#1}}
\def\email #1 \par{\centerline{Email: \tt #1}\vskip20pt\relax}
\def\abstract{\bgroup 
   \leftskip=3em \rightskip=3em
   \noindent{\bf Abstract:}\enspace \ignorespaces}
\def\endabstract{\par \egroup\bigskip}
\def\url#1{{\tt#1}}

%\font\fonttitul=cmb10 scaled\magstep3
%\font\fontsekce=cmb10 scaled\magstep2
%\font\verbtt=cmtt8

%% verbatim environment %%
\catcode`\"=13
\def"{\hbox\bgroup\let"=\egroup\setverb\tt}
\def\setverb{\def\do##1{\catcode`##1=12}\dospecials\obeyspaces}
\def\begtt{\medskip\bgroup
   \nobreak\setverb \parskip=0pt %\parindent=0pt
   \catcode`\"=12\catcode`\~=13 \obeylines
   \startverb}
{\catcode`\|=0 \catcode`\\=12
  |gdef|startverb#1\endtt{%
        |tt#1|nobreak|egroup|penalty0|medskip|scannexttoken}}
{\obeyspaces\gdef {\ }}
\long\def\scannexttoken#1{\ifx#1\par\else\noindent#1\fi}

%% lists %%
\def\begitems{\medskip\bgroup\catcode`\*=13 \narrower}
\def\enditems{\par\egroup\medskip}
{\catcode`\*=13 \gdef*{\par\noindent\llap{$\bullet$\ }\ignorespaces}
\gdef\numerate{%                         \numerate napsat těsně za \begitems
  \def*{\par\advance\itemnum by1\noindent
    \llap{\bf\the\itemnum. }\ignorespaces}}}

%% bib and cite

\def\citeref #1 #2 {\expandafter\def\csname cit:#1\endcsname{#2}}
\citeref enctex-url  1
\citeref enctex1     2
\citeref cstrip      3
\citeref texbook     4
\citeref yeti-enctex 5
\citeref yeti-home   6
\citeref nameslist   7

\def\cite#1{\expandafter\ifx \csname cit:#1\endcsname\relax
   \message{Warning: cite{#1} is not defined}[??]%
   \else [\csname cit:#1\endcsname]\fi}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\titul Second version of enc\TeX: \UTF-8 support

\author Petr Ol\v s\'ak

\institut Czech Technical University in Prague

\email petr@olsak.net

\abstract
The \UTF-8 encoding keeps the standard ASCII characters unchanged and
encodes the accented letters of our alphabets in two bytes. The
standard 8bit \TeX{} is not ready for the \UTF-8 input because it have to
manage the single character as two tokens. It means you cannot set the
"\catcode", "\uccode", etc. to these single characters and you cannot
do "\futurelet" of the next character in normal sense. The second
version of my enc\TeX{} solves these problems.

The enc\TeX{} is full backward compatible with the original \TeX. It adds
ten new primitives by which you can set or read the conversion
tables used by input processor of \TeX{} or used during output to the
terminal, log and "\write" files.

The second version gives possibility to convert the multi-byte
sequences to one byte or to control sequence. You can implement up to
256 \UTF-8 codes as one byte and unlimited number of other \UTF-8 codes as
a control sequence. All internals in 8bit \TeX{} are working in the same
way as if ``normal one byte encoding'' of input files is used.

I think that the \UTF-8 encoding will be used more common. In such
situation, there is no another way than to modify the input processor
of \TeX{} otherwise the 8bit \TeX{} will dead in short time.
\endabstract


\subtit What is enc\TeX?

Enc\TeX{} is a \TeX{} extension which allows re-encoding of input stream on
input processor of \TeX{} (before tokenization) and backward re-encoding of output stream during
"\write" and output to the terminal and log. It is implemented as the
patch to the change file "tex.ch". The patches are ready for web2c
distribution on~\cite{enctex-url} and (may be) enc\TeX{}
becomes as a standard web2c extension like mik\TeX{}.  Try to use the
"-enc" option on command line to test if your \TeX{} is equipped with this
extension. If not, you can get and apply the patches and rebuild
\TeX{} binaries. The patches affect \TeX{}, e\TeX{}, pdf\TeX{} and pdfe\TeX{}
programs. All these programs will dispose of this extension.

First version of enc\TeX{} was released in 1997. This version was able to
do only byte to byte conversion by affecting the \TeX{}'s internal {\it xord\/}
and {\it xchr\/} vectors. Enc\TeX{} introduced three primitives in its
first version:
"\xordcode" (reads or sets the values of {\it xord\/} vector for input
re-encoding), "\xchrcode" (reads or sets the values of {\it xchr\/} 
vector for output re-encoding) and "\xprncode" (reads or sets the 
values of newly introduced {\it xprn\/} vector which controls the
``print-ability'' of characters---it controls the possibility of the
character conversion to "^^ab" form on output side). See my article 
\cite{enctex1} for more details.

The first version of enc\TeX{} was not widely used because the TCX
tables was renovated in web2c distribution immediately after
enc\TeX{} was released. Roughly speaking, the TCX tables do the same
work as first version of my enc\TeX{} but less flexible. There was no
reason to combine the TCX tables with enc\TeX{}.

The second version of enc\TeX{} was designed and prepared by me in
December 2002 and released in January 2003. This version introduces
seven more primitives in order to user can control the multi-byte input
re-encoding and reverse output re-encoding. Groups of bytes on input stream
can be converted to one byte or to control sequence. The conversion is
done before tokenization but the control sequence generated by this
conversion is not re-tokenized again and token processor does not go
to ``ignoring spaces'' state after such control sequence. The backward
conversion during "\write" allows you to convert one byte or control
sequence to the original group of bytes.

The second version of enc\TeX{} is backward compatible 
with the first one, of course.
The detail documentation is available on \cite{enctex-url}.
The very nice on-line html documentation written by David Ne\v cas
(Yeti) is available on~\cite{yeti-enctex}


\subtit Motivation

I am maintainer of a "csplain" format---the basic part of the
CS\TeX{} package (for Czech and Slovak users). The "csplain" is
similar as very known plain\TeX{} format (by Don Knuth,
\cite{texbook}). Moreover, "csplain" solves the processing of all letters
from Czech and Slovak alphabets. It means that the \hbox{CS-font}s
(encoded by \hbox{ISO-8859-2}) is used by default instead of Computer
Modern fonts, the hyphenation tables for Czech and Slovak languages
are inputted in the same encoding and all Czech and Slovak letters
have to be treated as single non-composite symbols. These symbols have
"\catcode" set to 11 (letter), thus they can be used in control
sequences too.

Czech and Slovak alphabets are encoded by many mutual incompatible
standards and pseudo-standards in various operating systems and
operating environments. All these encodings
have to be converted to internal \hbox{ISO-8859-2} in "csplain" at input
processor level and they have to be converted back to the input
encoding during "\write", terminal and log output. Only this 
rule keeps the independence of the \TeX{} processing on the
operating system.

Note: if the source text of the Czech or Slovak document is
transported from one environment to another, the re-encoding to the
standard of the target environment is done automatically or by user
manually. The main principle is that the Czech and Slovak 
characters in source text have to be displayed correctly 
by used operating environment before it is processed by "csplain".

I have created the "cstrip" test in 1998 \cite{cstrip}. You can verify
if you are really using the "csplain" format by this test. 
This test verifies if \TeX{}'s input processor is set correctly
depending on your operating environment: all Czech and
Slovak characters have to be mapped into \hbox{ISO-8859-2} and they have to be
written back to the input encoding on terminal, log and "\write" files.
The "^^ab" form is not permitted for Czech and Slovak letters.

We were able to set the input processor properly for "csplain" in old
\TeX{} distributions. For example em\TeX{} have used TCP tables. On the
other hand the web2c distribution have had disabled its TCX tables in
1997 thus users was not able to implement the "csplain" format
correctly in operating environments where different encoding of our
alphabets from \hbox{ISO-8859-2} were used. This was the main motivation of
enc\TeX{} extension of \TeX{}.

Now, the new encoding standard derived from UNICODE and named \UTF-8 is
used very often. The non-ASCII characters are encoded in two or more
bytes here. If this encoding standard is used in our operating
environment then we need to be able to set multi-byte conversion in
input processor of \TeX{}. There is no other way to carry out the
"cstrip" test. This was my motivation of second version of the enc\TeX{}.


\subtit Multi-byte re-encoding

The detail documentation is included in enc\TeX{} package. Thus, 
only a short overview of the principles is presented here.

Second version of enc\TeX{} introduces seven new \TeX{} primitives to
define and control re-encoding between multi-byte input/output and
\TeX{} internal representation. These are:

\begitems
* "\mubyte" and "\endmubyte" defining the conversions, 
* "\mubytein", an integer register controlling input conversion, 
* "\mubyteout", an integer register controlling output conversion, 
* "\mubytelog", an integer register controlling output to terminal 
                and log file, 
* "\specialout", an integer register controlling 
                 "\special" argument treatment, and 
* "\noconvert", a primitive suppressing output conversion. 
\enditems

The default values of all the new registers are such that enc\TeX{}
behaves compatibly with unmodified \TeX{} (incidentally, it means
zeroes).

You can set the conversion table by the couple 
"\mubyte" and "\endmubyte". Examples:

\begtt
\mubyte ^^c1   ^^c3^^81\endmubyte % Aacute
\mubyte ^^c4   ^^c3^^84\endmubyte % Adieresis
...
\endtt

It means that for example the group of two bytes "^^c3^^81" will be
converted to one byte "^^c1" (if "\mubytein" is positive) and this
byte is converted back to byte sequence "^^c3^^81" during "\write" (if
"\mubyteout" is positive) and to log and terminal (if "\mubytelog" is
positive).

If your operating environment uses \UTF-8 encoding then the two bytes
"^^c3^^81" are displayed as \'A. You can do the ``normal things''
with this character in your text editor:

{\catcode`X=13 \def X{\'A} \catcode`Y=13 \def Y{\'a}
\begtt
\catcode `X=11  \def\myXsequence{...}
...
\def\run{\futurelet \next \dotest}
\def\dotest{\ifx \next X...}
\run Xha
...
\uccode`X=`X \lccode`X=`Y \sfcode`X=999 
...
\endtt
\par}

This behavior is very desirable for "csplain" format and "cstrip"
test.  You can convert your old "csplain" documents to the new \UTF-8
encoding and you can process them by the "csplain" in operating
environment with \UTF-8 standard. You get the absolutely the same
result as in old days.  This backward compatibility is most
important for me.

Next example:

\begtt
\mubyte \Alpha     ^^ce^^91\endmubyte 
\mubyte \Beta      ^^ce^^92\endmubyte 
...
\mubyte \leftarrow ^^e2^^86^^90\endmubyte 
\mubyte \uparrow   ^^e2^^86^^91\endmubyte 
...
\endtt

For instance, the group of three bytes "^^e2^^86^^90" is now
converted to "\leftarrow" control sequence and this control sequence
is converted back to "^^e2^^86^^90" during "\write" if
"\mubyteout"${}\geq 3$. The \UTF-8 encoding of math characters are
implemented by this way, see "utf8raw.tex" file in enc\TeX{} distribution
and "math-example.tex" for more complex example.

The \UTF-8 encoding tables for enc\TeX{} was prepared by 
David Ne\v cas~\cite{yeti-home}. 
He has made his own Python script which converts the
"NamesList.txt" \cite{nameslist} with UNICODE declarations of
characters to the "\mubyte"\dots"\endmubyte" tables.
This script is included in enc\TeX{} distribution.

There is another way of declaration of math symbols:

\begtt
\mubyte \utfAlpha   ^^ce^^91\endmubyte 
\mubyte \utfBeta    ^^ce^^92\endmubyte 
...
\def\uftAlpha{\ensuremathmode \Alpha}
\def\uftBeta{\ensuremathmode \Beta}
...
\def\ensuremathmode #1{\ifmmode #1\else $#1$\fi}
\endtt

This second solution is more robust because you can write math symbol 
in \UTF-8 encoding without a need to start the math mode
explicitly. Note that these symbols are displayed as natural math symbols
in your text editor. I did not use this solution in my macros
distributed with encTeX{} because this concept is not compatible 
with common \TeX{} documents where all math mode switches are 
explicitly written.


\subtit More funny examples

You can use enc\TeX{} capability for another purposes than only for
encoding. Look to the next simple example:

\begtt
\mubyte \TeX        TeX\endmubyte
\mubyte \copyright  (C)\endmubyte
\mubyte \dots       ...\endmubyte
\endtt

If you write ``"TeX and friends"'' (without backslash) then input
processor of enc\TeX{} converts this stream to "\TeX", \spacebk, "a",
"n", "d", \spacebk, "f", "r", etc. This is desired behavior. Moreover, if
"\mubyteout"${}\geq 3$ then the "\TeX" control sequence is not
expanded during "\write" and it is converted back to its input byte
sequence ``"TeX"''. On the other hand, if you write "\LaTeX", then the
input is converted to two control sequences "\La\TeX" and it is not
desired. You can solve this problem by defining the ``"\La"'' macro or
you can declare:

\begtt
\mubyte \LaTeX      LaTeX\endmubyte
\mubyte \LaTeXe     LaTeX2e\endmubyte
\endtt

Note that both byte sequences in this example begin by the same text
``"LaTeX"''. If the two characters ``"2e"'' follow immediatelly then
"\LaTeXe" control sequence is generated (by second line of this
example) else "\LaTeX" control sequence is generated.
The order of the lines in this example is unimportant.

What happens, if this setting is active and you write "\LaTeX"
(including backslash)? Nothing bad. The empty control sequence before
generated control sequence "\LaTeX" is suppressed by enc\TeX{}, it means
that only "\LaTeX" control sequence is the result of the conversion.

I implemented program "vlna" adding tildes after 
Czech one-letter prepositions (v, k, s, u, o, z) entirely 
in enc\TeX{} using "\mubyte". It correctly handles math mode 
(no tildes are added there). It's available in the enc\TeX{} 
distribution as an example of crazy application of enc\TeX{}
in the file "vlna.tex".


\subtit References

\bib enctex-url   \url{http://www.olsak.net/enctex.html},
                  the main page of enc\TeX{} project.
\bib enctex1      Petr Ol\v s\'ak: {\it Enc\TeX---A little extension of \TeX},
                  in: TUGboat, \hfil\break vol.~19/4, pp.~336--371.
\bib cstrip       \url{ftp://ftp.math.feld.cvut.cz/pub/cstex/base/cstrip.tar.gz}.
\bib texbook      Donald Knuth: {\it The \TeX{}book}.
\bib yeti-enctex  \url{http://www/trific.ath.cx/tex-mf/enctex/}
\bib yeti-home    \url{http://www/trific.ath.cx/}, 
                  David Ne\v cas -- home page.
\bib nameslist    \url{http://www.unicode.org/Public/UNIDATA/NamesList.txt}

\end