{\nopagenumbers} % use plain format, no LaTeX! cmd line: tex enctex %% My abbrevations: \def\UTF-{\hbox{UTF-}} \def\spacebk{$\langle${\it space\/}$\rangle$} \input ofs [ffonts] % Charter je ve skupině free fonts \setfonts [Charter/10pt] % výchozí rodina \setmath[//] % inicializace matematiky \fontdef\tt [CMTypewriter/mag1.1] % strojopis, korekce střední výšky \fontdef\verbtt [CMTypewriter-rm/8] % strojopis pro display ukázky \fontdef\small [!/9] % zmenšení pro abstrakt a záhlaví \addcmd \small {\baselineskip11pt \rm \def\mathversion{normal}\setmath[//]} \fontdef\fontsekce [!/12] % pro nadpisy sekcí \addcmd \fontsekce {\bf \let\it=\bi \def\mathversion{bold}\setmath[//]} \fontdef\fonttitul [!-bf/14.4] % titul \def\starthead{\global\headline= {\small \ifodd\pageno \hfil \thetitul \headspace \the\pageno \else \the\pageno \headspace \theauthor \hfil \fi}} \headline={\hfil\starthead} \def\headspace{\hskip2.5em\relax} \def\makeheadline{\vbox to0pt{\vskip-25pt \line{\vbox to8.5pt{}\the\headline}\vss}\nointerlineskip} \footline={\setfonts[/7]\setmath[//]\baselineskip=9pt \vbox{\hbox{\copytext}\hbox{\copykonvoj}}\hfil \global\footline={}} \def\lastpage{\advance\firstpage by\numpages \advance\firstpage by-1 \the\firstpage} \def\copytext{Euro\TeX{} 2003} \def\copykonvoj{} %% PlainTeX macros. You can change it or omit it \hsize=12,2cm \vsize=19,3cm \hoffset=63pt \voffset=43pt \parindent=14pt \lineskiplimit=-10pt \exhyphenpenalty=10000 \widowpenalty=10000 \clubpenalty=10000 \raggedbottom \newcount\subnum \def\subtit #1\par{\advance\subnum by1 \removelastskip %\goodbreak \vskip17pt plus2pt minus1pt\noindent{\fontsekce \the\subnum\enspace\enspace #1}% \par\nobreak\vskip11pt plus2pt minus1pt \everypar{\setbox0=\lastbox \everypar={}}} \def\reference {\subnum=-1 \kap Reference\par \small} \def\bib #1 {\par\advance\subnum by1 \leftskip=\parindent \noindent\llap{\expandafter \ifx \csname cit:#1\endcsname\relax ??\else\csname cit:#1\endcsname\fi.\enspace}\ignorespaces} \def\titul #1 \par{\def\thetitul{#1} \centerline{\fonttitul #1}\vskip20pt\relax} \def\author #1 \par{\def\theauthor{#1} \centerline{#1}\vskip10pt\relax} \def\institut #1 \par{\centerline{#1}} \def\email #1 \par{\centerline{Email: \tt #1}\vskip20pt\relax} \def\abstract{\bgroup \leftskip=3em \rightskip=3em \noindent{\bf Abstract:}\enspace \ignorespaces} \def\endabstract{\par \egroup\bigskip} \def\url#1{{\tt#1}} %\font\fonttitul=cmb10 scaled\magstep3 %\font\fontsekce=cmb10 scaled\magstep2 %\font\verbtt=cmtt8 %% verbatim environment %% \catcode`\"=13 \def"{\hbox\bgroup\let"=\egroup\setverb\tt} \def\setverb{\def\do##1{\catcode`##1=12}\dospecials\obeyspaces} \def\begtt{\medskip\bgroup \nobreak\setverb \parskip=0pt %\parindent=0pt \catcode`\"=12\catcode`\~=13 \obeylines \startverb} {\catcode`\|=0 \catcode`\\=12 |gdef|startverb#1\endtt{% |tt#1|nobreak|egroup|penalty0|medskip|scannexttoken}} {\obeyspaces\gdef {\ }} \long\def\scannexttoken#1{\ifx#1\par\else\noindent#1\fi} %% lists %% \def\begitems{\medskip\bgroup\catcode`\*=13 \narrower} \def\enditems{\par\egroup\medskip} {\catcode`\*=13 \gdef*{\par\noindent\llap{$\bullet$\ }\ignorespaces} \gdef\numerate{% \numerate napsat těsně za \begitems \def*{\par\advance\itemnum by1\noindent \llap{\bf\the\itemnum. }\ignorespaces}}} %% bib and cite \def\citeref #1 #2 {\expandafter\def\csname cit:#1\endcsname{#2}} \citeref enctex-url 1 \citeref enctex1 2 \citeref cstrip 3 \citeref texbook 4 \citeref yeti-enctex 5 \citeref yeti-home 6 \citeref nameslist 7 \def\cite#1{\expandafter\ifx \csname cit:#1\endcsname\relax \message{Warning: cite{#1} is not defined}[??]% \else [\csname cit:#1\endcsname]\fi} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \titul Second version of enc\TeX: \UTF-8 support \author Petr Ol\v s\'ak \institut Czech Technical University in Prague \email petr@olsak.net \abstract The \UTF-8 encoding keeps the standard ASCII characters unchanged and encodes the accented letters of our alphabets in two bytes. The standard 8bit \TeX{} is not ready for the \UTF-8 input because it have to manage the single character as two tokens. It means you cannot set the "\catcode", "\uccode", etc. to these single characters and you cannot do "\futurelet" of the next character in normal sense. The second version of my enc\TeX{} solves these problems. The enc\TeX{} is full backward compatible with the original \TeX. It adds ten new primitives by which you can set or read the conversion tables used by input processor of \TeX{} or used during output to the terminal, log and "\write" files. The second version gives possibility to convert the multi-byte sequences to one byte or to control sequence. You can implement up to 256 \UTF-8 codes as one byte and unlimited number of other \UTF-8 codes as a control sequence. All internals in 8bit \TeX{} are working in the same way as if ``normal one byte encoding'' of input files is used. I think that the \UTF-8 encoding will be used more common. In such situation, there is no another way than to modify the input processor of \TeX{} otherwise the 8bit \TeX{} will dead in short time. \endabstract \subtit What is enc\TeX? Enc\TeX{} is a \TeX{} extension which allows re-encoding of input stream on input processor of \TeX{} (before tokenization) and backward re-encoding of output stream during "\write" and output to the terminal and log. It is implemented as the patch to the change file "tex.ch". The patches are ready for web2c distribution on~\cite{enctex-url} and (may be) enc\TeX{} becomes as a standard web2c extension like mik\TeX{}. Try to use the "-enc" option on command line to test if your \TeX{} is equipped with this extension. If not, you can get and apply the patches and rebuild \TeX{} binaries. The patches affect \TeX{}, e\TeX{}, pdf\TeX{} and pdfe\TeX{} programs. All these programs will dispose of this extension. First version of enc\TeX{} was released in 1997. This version was able to do only byte to byte conversion by affecting the \TeX{}'s internal {\it xord\/} and {\it xchr\/} vectors. Enc\TeX{} introduced three primitives in its first version: "\xordcode" (reads or sets the values of {\it xord\/} vector for input re-encoding), "\xchrcode" (reads or sets the values of {\it xchr\/} vector for output re-encoding) and "\xprncode" (reads or sets the values of newly introduced {\it xprn\/} vector which controls the ``print-ability'' of characters---it controls the possibility of the character conversion to "^^ab" form on output side). See my article \cite{enctex1} for more details. The first version of enc\TeX{} was not widely used because the TCX tables was renovated in web2c distribution immediately after enc\TeX{} was released. Roughly speaking, the TCX tables do the same work as first version of my enc\TeX{} but less flexible. There was no reason to combine the TCX tables with enc\TeX{}. The second version of enc\TeX{} was designed and prepared by me in December 2002 and released in January 2003. This version introduces seven more primitives in order to user can control the multi-byte input re-encoding and reverse output re-encoding. Groups of bytes on input stream can be converted to one byte or to control sequence. The conversion is done before tokenization but the control sequence generated by this conversion is not re-tokenized again and token processor does not go to ``ignoring spaces'' state after such control sequence. The backward conversion during "\write" allows you to convert one byte or control sequence to the original group of bytes. The second version of enc\TeX{} is backward compatible with the first one, of course. The detail documentation is available on \cite{enctex-url}. The very nice on-line html documentation written by David Ne\v cas (Yeti) is available on~\cite{yeti-enctex} \subtit Motivation I am maintainer of a "csplain" format---the basic part of the CS\TeX{} package (for Czech and Slovak users). The "csplain" is similar as very known plain\TeX{} format (by Don Knuth, \cite{texbook}). Moreover, "csplain" solves the processing of all letters from Czech and Slovak alphabets. It means that the \hbox{CS-font}s (encoded by \hbox{ISO-8859-2}) is used by default instead of Computer Modern fonts, the hyphenation tables for Czech and Slovak languages are inputted in the same encoding and all Czech and Slovak letters have to be treated as single non-composite symbols. These symbols have "\catcode" set to 11 (letter), thus they can be used in control sequences too. Czech and Slovak alphabets are encoded by many mutual incompatible standards and pseudo-standards in various operating systems and operating environments. All these encodings have to be converted to internal \hbox{ISO-8859-2} in "csplain" at input processor level and they have to be converted back to the input encoding during "\write", terminal and log output. Only this rule keeps the independence of the \TeX{} processing on the operating system. Note: if the source text of the Czech or Slovak document is transported from one environment to another, the re-encoding to the standard of the target environment is done automatically or by user manually. The main principle is that the Czech and Slovak characters in source text have to be displayed correctly by used operating environment before it is processed by "csplain". I have created the "cstrip" test in 1998 \cite{cstrip}. You can verify if you are really using the "csplain" format by this test. This test verifies if \TeX{}'s input processor is set correctly depending on your operating environment: all Czech and Slovak characters have to be mapped into \hbox{ISO-8859-2} and they have to be written back to the input encoding on terminal, log and "\write" files. The "^^ab" form is not permitted for Czech and Slovak letters. We were able to set the input processor properly for "csplain" in old \TeX{} distributions. For example em\TeX{} have used TCP tables. On the other hand the web2c distribution have had disabled its TCX tables in 1997 thus users was not able to implement the "csplain" format correctly in operating environments where different encoding of our alphabets from \hbox{ISO-8859-2} were used. This was the main motivation of enc\TeX{} extension of \TeX{}. Now, the new encoding standard derived from UNICODE and named \UTF-8 is used very often. The non-ASCII characters are encoded in two or more bytes here. If this encoding standard is used in our operating environment then we need to be able to set multi-byte conversion in input processor of \TeX{}. There is no other way to carry out the "cstrip" test. This was my motivation of second version of the enc\TeX{}. \subtit Multi-byte re-encoding The detail documentation is included in enc\TeX{} package. Thus, only a short overview of the principles is presented here. Second version of enc\TeX{} introduces seven new \TeX{} primitives to define and control re-encoding between multi-byte input/output and \TeX{} internal representation. These are: \begitems * "\mubyte" and "\endmubyte" defining the conversions, * "\mubytein", an integer register controlling input conversion, * "\mubyteout", an integer register controlling output conversion, * "\mubytelog", an integer register controlling output to terminal and log file, * "\specialout", an integer register controlling "\special" argument treatment, and * "\noconvert", a primitive suppressing output conversion. \enditems The default values of all the new registers are such that enc\TeX{} behaves compatibly with unmodified \TeX{} (incidentally, it means zeroes). You can set the conversion table by the couple "\mubyte" and "\endmubyte". Examples: \begtt \mubyte ^^c1 ^^c3^^81\endmubyte % Aacute \mubyte ^^c4 ^^c3^^84\endmubyte % Adieresis ... \endtt It means that for example the group of two bytes "^^c3^^81" will be converted to one byte "^^c1" (if "\mubytein" is positive) and this byte is converted back to byte sequence "^^c3^^81" during "\write" (if "\mubyteout" is positive) and to log and terminal (if "\mubytelog" is positive). If your operating environment uses \UTF-8 encoding then the two bytes "^^c3^^81" are displayed as \'A. You can do the ``normal things'' with this character in your text editor: {\catcode`X=13 \def X{\'A} \catcode`Y=13 \def Y{\'a} \begtt \catcode `X=11 \def\myXsequence{...} ... \def\run{\futurelet \next \dotest} \def\dotest{\ifx \next X...} \run Xha ... \uccode`X=`X \lccode`X=`Y \sfcode`X=999 ... \endtt \par} This behavior is very desirable for "csplain" format and "cstrip" test. You can convert your old "csplain" documents to the new \UTF-8 encoding and you can process them by the "csplain" in operating environment with \UTF-8 standard. You get the absolutely the same result as in old days. This backward compatibility is most important for me. Next example: \begtt \mubyte \Alpha ^^ce^^91\endmubyte \mubyte \Beta ^^ce^^92\endmubyte ... \mubyte \leftarrow ^^e2^^86^^90\endmubyte \mubyte \uparrow ^^e2^^86^^91\endmubyte ... \endtt For instance, the group of three bytes "^^e2^^86^^90" is now converted to "\leftarrow" control sequence and this control sequence is converted back to "^^e2^^86^^90" during "\write" if "\mubyteout"${}\geq 3$. The \UTF-8 encoding of math characters are implemented by this way, see "utf8raw.tex" file in enc\TeX{} distribution and "math-example.tex" for more complex example. The \UTF-8 encoding tables for enc\TeX{} was prepared by David Ne\v cas~\cite{yeti-home}. He has made his own Python script which converts the "NamesList.txt" \cite{nameslist} with UNICODE declarations of characters to the "\mubyte"\dots"\endmubyte" tables. This script is included in enc\TeX{} distribution. There is another way of declaration of math symbols: \begtt \mubyte \utfAlpha ^^ce^^91\endmubyte \mubyte \utfBeta ^^ce^^92\endmubyte ... \def\uftAlpha{\ensuremathmode \Alpha} \def\uftBeta{\ensuremathmode \Beta} ... \def\ensuremathmode #1{\ifmmode #1\else $#1$\fi} \endtt This second solution is more robust because you can write math symbol in \UTF-8 encoding without a need to start the math mode explicitly. Note that these symbols are displayed as natural math symbols in your text editor. I did not use this solution in my macros distributed with encTeX{} because this concept is not compatible with common \TeX{} documents where all math mode switches are explicitly written. \subtit More funny examples You can use enc\TeX{} capability for another purposes than only for encoding. Look to the next simple example: \begtt \mubyte \TeX TeX\endmubyte \mubyte \copyright (C)\endmubyte \mubyte \dots ...\endmubyte \endtt If you write ``"TeX and friends"'' (without backslash) then input processor of enc\TeX{} converts this stream to "\TeX", \spacebk, "a", "n", "d", \spacebk, "f", "r", etc. This is desired behavior. Moreover, if "\mubyteout"${}\geq 3$ then the "\TeX" control sequence is not expanded during "\write" and it is converted back to its input byte sequence ``"TeX"''. On the other hand, if you write "\LaTeX", then the input is converted to two control sequences "\La\TeX" and it is not desired. You can solve this problem by defining the ``"\La"'' macro or you can declare: \begtt \mubyte \LaTeX LaTeX\endmubyte \mubyte \LaTeXe LaTeX2e\endmubyte \endtt Note that both byte sequences in this example begin by the same text ``"LaTeX"''. If the two characters ``"2e"'' follow immediatelly then "\LaTeXe" control sequence is generated (by second line of this example) else "\LaTeX" control sequence is generated. The order of the lines in this example is unimportant. What happens, if this setting is active and you write "\LaTeX" (including backslash)? Nothing bad. The empty control sequence before generated control sequence "\LaTeX" is suppressed by enc\TeX{}, it means that only "\LaTeX" control sequence is the result of the conversion. I implemented program "vlna" adding tildes after Czech one-letter prepositions (v, k, s, u, o, z) entirely in enc\TeX{} using "\mubyte". It correctly handles math mode (no tildes are added there). It's available in the enc\TeX{} distribution as an example of crazy application of enc\TeX{} in the file "vlna.tex". \subtit References \bib enctex-url \url{http://www.olsak.net/enctex.html}, the main page of enc\TeX{} project. \bib enctex1 Petr Ol\v s\'ak: {\it Enc\TeX---A little extension of \TeX}, in: TUGboat, \hfil\break vol.~19/4, pp.~336--371. \bib cstrip \url{ftp://ftp.math.feld.cvut.cz/pub/cstex/base/cstrip.tar.gz}. \bib texbook Donald Knuth: {\it The \TeX{}book}. \bib yeti-enctex \url{http://www/trific.ath.cx/tex-mf/enctex/} \bib yeti-home \url{http://www/trific.ath.cx/}, David Ne\v cas -- home page. \bib nameslist \url{http://www.unicode.org/Public/UNIDATA/NamesList.txt} \end