リビジョン | c85b9aa3a044213f560857e22da5b95f591a6f33 (tree) |
---|---|
日時 | 2011-11-21 20:01:35 |
作者 | Hironori Kitagawa <h_kitagawa2001@yaho...> |
コミッター | Hironori Kitagawa |
Updated the draft for post-proceedings.
@@ -74,21 +74,22 @@ internal processing methods of \LuaTeX-ja. | ||
74 | 74 | To typeset Japanese documents with \TeX, ASCII \pTeX~\cite{ptex} has |
75 | 75 | been widely used in Japan. There are other methods---for example, using |
76 | 76 | Omega and OTP~\cite{omega}, or with the CJK package---to do so, however, |
77 | -these alternative methods did not become a majority. The author thinks | |
77 | +these alternative methods did not become majority. The author thinks | |
78 | 78 | that this is because \pTeX\ enables us to produce high-quality documents |
79 | 79 | (e.g.,~supporting vertical typesetting), and the appearance of \pTeX\ is |
80 | 80 | earlier than that of alternatives described above. |
81 | 81 | |
82 | -However, \pTeX\ has been left behind from the extensions of \TeX\ | |
83 | -such as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In | |
84 | -recent years, the situation has become better, because of development | |
85 | -of |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}), | |
82 | +However, \pTeX\ has been left behind from the extensions of \TeX\ such | |
83 | +as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In recent | |
84 | +years, the situation has become better, by development of | |
85 | +|ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}), | |
86 | 86 | $\varepsilon$-\pTeX~\cite{eptex} by the author,~and u\pTeX~\cite{uptex} |
87 | -by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, to develop | |
88 | -an engine extension localized for Japanese, is not wise. This approach | |
89 | -needs lots of work for \emph{each} engine, and since \LuaTeX\ has an ability | |
90 | -to hook \TeX's internal process by using Lua callbacks, the necessity of | |
91 | -an engine extension is getting smaller. | |
87 | +by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, | |
88 | +to develop an engine extension localized for Japanese, is not wise. This | |
89 | +approach needs lots of work for \emph{each} engine. In addition, if we | |
90 | +use \LuaTeX, the necessity of an engine extension is getting smaller | |
91 | +because \LuaTeX\ has an ability to hook \TeX's internal process by using | |
92 | +Lua callbacks. | |
92 | 93 | |
93 | 94 | |
94 | 95 | There were several experimental attempts to typeset |
@@ -111,18 +112,18 @@ these situations. | ||
111 | 112 | |
112 | 113 | \subsection{Development policy of \LuaTeX-ja} |
113 | 114 | \label{ssec-pol} |
114 | -The first aim of \LuaTeX-ja project is to implement features (from the | |
115 | -`primitive' level) of \pTeX\ as macros under \LuaTeX, so \LuaTeX-ja is | |
116 | -much affected by \pTeX. However, as development proceeds, some | |
117 | -technical/conceptual difficulties are arisen. Hence we changed the aim | |
115 | +The first aim of \LuaTeX-ja project was to implement features (from the | |
116 | +`primitive' level) of \pTeX\ as macros under \LuaTeX, therefore \LuaTeX-ja is | |
117 | +much affected by \pTeX. However, as development proceeded, some | |
118 | +technical/conceptual difficulties arose. Hence we changed the aim | |
118 | 119 | of the project as follows: |
119 | 120 | \begin{itemize} |
120 | 121 | \item\emph{\LuaTeX-ja offers at least the same flexibility of |
121 | 122 | typesetting that p\TeX\ has.} |
122 | 123 | |
123 | - We think that the ability of producing outputs conformed to | |
124 | + We are not satisfied with the ability of producing outputs conformed to | |
124 | 125 | JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for |
125 | - typesetting, or to a technical note~\cite{w3c} by W3C is not enough; | |
126 | + typesetting, or to a technical note~\cite{w3c} by W3C; | |
126 | 127 | if one wants to produce very incoherent outputs for some reason, it |
127 | 128 | should be possible. |
128 | 129 | In this point, previous attempts of Japanese typesetting with \LuaTeX\ |
@@ -144,59 +145,66 @@ In this point, previous attempts of Japanese typesetting with \LuaTeX\ | ||
144 | 145 | \subsection{Overview of the processes} |
145 | 146 | \label{ssec-over} |
146 | 147 | We describe an outline of \LuaTeX-ja's process in order. |
148 | + | |
147 | 149 | \begin{itemize} |
148 | 150 | \item In the |process_input_buffer| callback: treatment of breaking |
149 | 151 | lines after a Japanese character (in Subsection~\ref{ssec-line}). |
150 | 152 | |
151 | 153 | \item In the |hyphenate| callback: font replacement. |
152 | 154 | |
153 | -\LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the list. If | |
155 | +\LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the horizontal list. If | |
154 | 156 | the character represented by $p$ is considered as a Japanese |
155 | - character, the font used in $p$ is replaced by the value of | |
157 | + character, the font used at $p$ is replaced by the value of | |
156 | 158 | |\ltj@curjfnt|, an attribute for `the current Japanese font' |
157 | 159 | at~$p$. |
158 | 160 | |
159 | -Furthermore the subtype of $p$ is subtracted by 1 to suppress | |
160 | - hyphenation around it by \LuaTeX, because later processes of | |
161 | +Furthermore, the subtype of $p$ is subtracted by 1 to suppress | |
162 | + hyphenation around $p$ by \LuaTeX, because later processes of | |
161 | 163 | \LuaTeX-ja take care of all things about Japanese characters. |
162 | 164 | |
163 | 165 | \item In |pre_linebreak_filter| and |hpack_filter| callbacks: |
164 | 166 | |
165 | 167 | \begin{enumerate} |
166 | 168 | \item \LuaTeX-ja has its own stack system, and the current horizontal |
167 | - list is traversed in this stage to determine what is the level of | |
168 | - \LuaTeX-ja's internal stack at the end of the list (in | |
169 | - Subsection~\ref{ssec-stack}). | |
169 | + list is traversed in this stage to determine what the level of | |
170 | + \LuaTeX-ja's internal stack at the end of the list is. We will | |
171 | + discuss it in Subsection~\ref{ssec-stack}. | |
170 | 172 | |
171 | 173 | \item In this stage, \LuaTeX-ja inserts glues/kerns for Japanese |
172 | - typesetting in the list. This is the core of \LuaTeX-ja (in | |
173 | - Subsection~\ref{ssec-jglue}). | |
174 | + typesetting in the list. This is the core routine of \LuaTeX-ja. | |
175 | + We will discuss it in Subsections | |
176 | + \ref{ssec-jglue}~and~\ref{ssec-jspec} . | |
174 | 177 | |
175 | 178 | \item To make a match between a metric and a real font, sometimes |
176 | - adjustument of the position of (Japanese) glyphs are performed | |
177 | - (Subsection~\ref{ssec-width}). | |
179 | + adjustument of the position of (Japanese) glyphs are performed. | |
180 | + We will discuss it in Subsection~\ref{ssec-width}. | |
178 | 181 | \end{enumerate} |
179 | -\item In the |mlist_to_hlist| callback: replacement of Japanese characters in math formulas. | |
180 | -This stage is similar to adjustument of the position of glyphs (see | |
181 | - above), so we omit it from this paper. | |
182 | +\item In the |mlist_to_hlist| callback: treatment of Japanese characters | |
183 | + in math formulas. This stage is similar to adjustment of the | |
184 | + position of glyphs (see above), so we omit to describe this stage | |
185 | + from this paper. | |
182 | 186 | \end{itemize} |
183 | 187 | |
188 | +In this paper, a \emph{alphabetic character} means a non-Japanese | |
189 | +character. Similarly, we use the word an \emph{alphabetic font} as the | |
190 | +counterpart of a jJpanese font. | |
191 | + | |
184 | 192 | \subsection{Contents of this paper} |
185 | 193 | Here we describe the contents of the rest of this paper briefly. In |
186 | -Section~\ref{sec:differences_with_ptex}, | |
187 | -we describe major differences between \pTeX\ and \LuaTeX-ja. | |
188 | -The next section, Section~\ref{sec:distinction_of_characters}, | |
189 | -is concentrated on a problem `how we | |
190 | -distinguish between Japanese characters and alphabetic characters'. In | |
191 | -Section~\ref{sec:current_status}, we show rest of features of \LuaTeX-ja package, and | |
192 | -current status of the package. Finally, in Section~\ref{sec:implementation}, we describe some | |
193 | -internal routines of \LuaTeX-ja. | |
194 | +Section~\ref{sec:differences_with_ptex}, we describe major differences | |
195 | +between \pTeX\ and \LuaTeX-ja. The next section, | |
196 | +Section~\ref{sec:distinction_of_characters}, is concentrated on a | |
197 | +problem how we distinguish between Japanese characters and alphabetic | |
198 | +characters. In Section~\ref{sec:current_status}, we show current | |
199 | +development status of the package. Finally, in | |
200 | +Section~\ref{sec:implementation}, we describe some internal routines of | |
201 | +\LuaTeX-ja. | |
194 | 202 | |
195 | 203 | \subsection{General information of the project} |
196 | 204 | This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki |
197 | 205 | is located on |
198 | 206 | \url{http://sourceforge.jp/projects/luatex-ja/wiki/}. There is |
199 | -no stable version on October 15, 2011, however a set of developer sources can be | |
207 | +no stable version on October 22, 2011, however a set of developer sources can be | |
200 | 208 | obtained from the git repository. Members of the project team are as follows |
201 | 209 | (in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato, |
202 | 210 | Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda, |
@@ -212,7 +220,7 @@ overview of \pTeX, please see Okumura~\cite{ptexjp}. | ||
212 | 220 | |
213 | 221 | \subsection{Names of control sequences} |
214 | 222 | \label{ssec-csname} Because \pTeX\ is an engine modification of Knuth's |
215 | -original \TeX82 engine, some primitives added by it take a form that is | |
223 | +original \TeX82 engine, some of the additional primitives take a form that is | |
216 | 224 | very difficult to be simulated by a macro. For example, an additional |
217 | 225 | primitive |\prebreakpenalty|$\langle\hbox{\it |
218 | 226 | char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in \pTeX\ |
@@ -221,21 +229,19 @@ $\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it | ||
221 | 229 | penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it |
222 | 230 | char\_code}\rangle$ can be also used for retrieving the value. |
223 | 231 | |
224 | -Moreover, there are some parameters which values of them at the end of a | |
225 | -horizontal box or that of a paragraph are effective in whole box or | |
226 | -paragraph. These parameters were implemented as additional internal | |
227 | -parameters in \pTeX. However, the implementation of these parameters in | |
228 | -\LuaTeX-ja is not so easy; we will discuss it in | |
229 | -Subsection~\ref{ssec-stack}. | |
232 | +Moreover, there are some internal parameters of \pTeX\ which values of them at the end of a | |
233 | +horizontal box or that of a paragraph are valid in whole box or | |
234 | +paragraph. However, the implementation of these parameters in | |
235 | +\LuaTeX-ja is not so easy; we will discuss it in Subsection~\ref{ssec-stack}. | |
230 | 236 | |
231 | -From above two~problems we discussed above, the assignment and retrieval | |
237 | +From above two~problems discussed above, the assignment and retrieval | |
232 | 238 | of most parameters in \LuaTeX-ja are summarized into the following |
233 | 239 | three~control sequences: |
234 | 240 | \begin{itemize} |
235 | 241 | \item |\ltjsetparameter{|$\langle\hbox{\it |
236 | 242 | name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local |
237 | 243 | assignment. |
238 | -\item |\ltjglobalsetparameter|: for global assignment. These two control | |
244 | +\item |\ltjglobalsetparameter|: for global assignment. Note that these two control | |
239 | 245 | sequences obey the value of |\globaldefs| primitive. |
240 | 246 | \item |\ltjgetparameter{|$\langle\hbox{\it |
241 | 247 | name}\rangle$|}[{|$\langle\hbox{\it optional |
@@ -272,7 +278,7 @@ letter `あ' will be treated as an alphabetic character by | ||
272 | 278 | \LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in |
273 | 279 | the output, where the actual output in the figure does not so. This is |
274 | 280 | because `あ' is considered a Japanese character by \LuaTeX-ja, |
275 | -when \LuaTeX-ja does a decision whether U+FFFFF will be added to the | |
281 | +when \LuaTeX-ja does the decision whether U+FFFFF will be added to the | |
276 | 282 | input line~2. |
277 | 283 | |
278 | 284 | \begin{figure} |
@@ -295,7 +301,7 @@ JFMs are essentially same, and only differ in their names. For example, | ||
295 | 301 | |min10.tfm| and |goth10.tfm|, which are JFMs shipped with \pTeX\ for |
296 | 302 | seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family, |
297 | 303 | differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and |
298 | -|jisg.tfm|, which consists a parts of \emph{jis} font metric, which is | |
304 | +|jisg.tfm|, which is included in the \emph{jis} font metric, which is | |
299 | 305 | used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦), |
300 | 306 | are totally same as binary files. Considering this situation, we |
301 | 307 | decided to separate `real' fonts and metrics used for them in |
@@ -305,14 +311,14 @@ remarks: | ||
305 | 311 | \begin{itemize} |
306 | 312 | \item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|. |
307 | 313 | \item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so |
308 | - |file:| and |name:| prefixes, and various font features can be | |
309 | - used as the line~1 in Figure~\ref{fig-jfdef}. | |
314 | + \hbox{\tt file:} and \hbox{\tt name:} prefixes, and various font features can be | |
315 | + used as the first line in Figure~\ref{fig-jfdef}. | |
310 | 316 | \item The |jfm| key specifies the metric for the font. In |
311 | 317 | Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a |
312 | 318 | Lua script named |jfm-ujis.lua|. This metric is the standard |
313 | 319 | metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf} |
314 | 320 | package~\cite{otf}. |
315 | -\item The |psft:| prefix can be used to specify name-only, non-embedded | |
321 | +\item The \hbox{psft:} prefix can be used to specify name-only, non-embedded | |
316 | 322 | fonts. When one display a pdf with these fonts, actual fonts which |
317 | 323 | will be used for them depend on a pdf reader. |
318 | 324 | \end{itemize} |
@@ -326,7 +332,7 @@ metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the | ||
326 | 332 | \emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|. |
327 | 333 | |
328 | 334 | Note that |-kern| in features |
329 | -is important, because kerning information from real font itself will | |
335 | +is important, because kerning information from a real font itself will | |
330 | 336 | clash with glue/kern informations from the metric. |
331 | 337 | |
332 | 338 | \begin{figure} |
@@ -351,7 +357,7 @@ process will be done when a horizontal box or a paragraph is ended, so | ||
351 | 357 | |
352 | 358 | The situation for Japanese characters is more complicated. |
353 | 359 | Glues (and kerns) which are needed for Japanese |
354 | -typesetting will be divided into the following three categories: | |
360 | +typesetting are divided into the following three categories: | |
355 | 361 | \begin{itemize} |
356 | 362 | \item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue}, |
357 | 363 | for short). |
@@ -385,6 +391,8 @@ this specification are to behave like alphabetic characters in \LuaTeX\ | ||
385 | 391 | for \LuaTeX-ja's process. |
386 | 392 | |
387 | 393 | \subsection{Insertion of glues/kerns for Japanese typesetting: specification} |
394 | +\label{ssec-jspec} | |
395 | + | |
388 | 396 | \begin{table} |
389 | 397 | \caption{Examples of differences between \pTeX\ and \LuaTeX-ja.} |
390 | 398 | \label{tab-jfmglue} |
@@ -422,16 +430,16 @@ Now we will take a look inside the insertion process itself, and describe 4~poin | ||
422 | 430 | \begin{description} |
423 | 431 | \item[Ignored Nodes] |
424 | 432 | As noted in the previous subsection, the insertion process in \pTeX\ can |
425 | - be interrupted by saying |{}| or anything else\footnote{This | |
433 | + be interrupted by saying |{}| or anything else.\footnote{This | |
426 | 434 | is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for |
427 | - \texttt{min10.tfm} and other `old' JFMs work.}. This leads | |
428 | - the second row in Table~\ref{tab-jfmglue}, or | |
429 | - Figure~\ref{fig-ptexjfm}. `The process is interrupted' means | |
430 | - that \pTeX\ does not think the letter `】\inhibitglue' is | |
431 | - followed by `\inhibitglue【', hence two half-width glues are | |
432 | - inserted between between `】\inhibitglue' and `\inhibitglue【', | |
433 | - where one is from `】\inhibitglue' and another is from | |
434 | - `\inhibitglue【'. | |
435 | + \texttt{min10.tfm} and other `old' JFMs work.} This leads the | |
436 | + second row in Table~\ref{tab-jfmglue}, or | |
437 | + Figure~\ref{fig-ptexjfm}. Here `the process is interrupted' | |
438 | + means that \pTeX\ does not think the letter `】\inhibitglue' | |
439 | + is followed by `\inhibitglue【', hence two half-width glues | |
440 | + are inserted between `】\inhibitglue' and `\inhibitglue【', | |
441 | + where the left one is from `】\inhibitglue' and the right one | |
442 | + is from `\inhibitglue【'. | |
435 | 443 | |
436 | 444 | On the other hand, in \LuaTeX-ja, the process is done inside |
437 | 445 | |hpack_filter| and |pre_linebreak_filter| callbacks. Hence, |
@@ -444,14 +452,14 @@ As noted in the previous subsection, the insertion process in \pTeX\ can | ||
444 | 452 | \emph{penalty\_node}---, as shown in (4). |
445 | 453 | |
446 | 454 | |
447 | -By the way, around a \emph{glyph\_node} $p$ there may be some nld odes | |
455 | +By the way, around a \emph{glyph\_node} $p$ there may be some nodes | |
448 | 456 | attached to $p$. These are an accent and kerns for |
449 | - positioning it, and a kern from the italic | |
457 | + moving it to the right place, and a kern from the italic | |
450 | 458 | correction\footnote{\TeX82 (and \LuaTeX) does not distinguish |
451 | 459 | between explicit kern and a kern for italic correction. To |
452 | - distinguish them, an additional subtype for kern is introduced | |
460 | + distinguish them, an additional subtype for a kern is introduced | |
453 | 461 | in \pTeX. On the other hand, \LuaTeX-ja uses an additional attribute and |
454 | - redefines \texttt{\char`\\/}.} for $p$. It is natural that | |
462 | + redefines \texttt{\char`\\/} to set this attribute.} for $p$. It is natural that | |
455 | 463 | these attachments should be ignored inside the process. Hence |
456 | 464 | \LuaTeX-ja takes this approach, as the latest version of |
457 | 465 | \pTeX\ (p3.2). This explains (2) in the figure. |
@@ -485,7 +493,7 @@ However this seems to be unnatural, since two Japanese fonts in the | ||
485 | 493 | \mc 明朝)\gt (ゴシック |
486 | 494 | \end{quote} |
487 | 495 | One might have the situation that this default behavior is not |
488 | - suitable. \LuaTeX-ja offers a way to cope with this case, but | |
496 | + suitable. \LuaTeX-ja offers a way to handle this situation, but | |
489 | 497 | we leave it to the manual~\cite{man}. |
490 | 498 | |
491 | 499 | \item[Fonts with Different Metrics] |
@@ -503,9 +511,9 @@ As the previous paragraph, this input yields the following, by \pTeX: | ||
503 | 511 | \mc 漢)\hbox{}\gt (漢)\hbox{}\large (大 |
504 | 512 | \end{quote} |
505 | 513 | We thought that amounts of spaces between parentheses in above output |
506 | - are too much. So we changed the default behavior of | |
507 | - \LuaTeX-ja so that the amount of a glue between two Japanese | |
508 | - characters with different metrics is the average of a glue | |
514 | + are too much. Hence we changed the default behavior of | |
515 | + \LuaTeX-ja, so that the amount of a glue between two Japanese | |
516 | + characters with different metrics is the \emph{average} of a glue | |
509 | 517 | from the left character and that from the right |
510 | 518 | character. For example, Figure~\ref{fig-diffmet} shows the |
511 | 519 | output from above input. The width of glue indicated `(1)' is |
@@ -538,33 +546,32 @@ We thought that amounts of spaces between parentheses in above output | ||
538 | 546 | |
539 | 547 | \item[\emph{kanjiskip} and \emph{xkanjiskip}] |
540 | 548 | In \pTeX, the value of \emph{xkanjiskip} is controlled by a skip named |
541 | - |\xkanjiskip|. A defect of this implementation is that the | |
542 | - value of \emph{xkanjiskip} is not connected with the size of | |
543 | - the currnt Japanese font. It seems that |EXTRASPACE|, | |
549 | + |\xkanjiskip|. A well-known defect of this implementation is | |
550 | + that the value of \emph{xkanjiskip} is not connected with the | |
551 | + size of the currnt Japanese font. It seems that |EXTRASPACE|, | |
544 | 552 | |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are |
545 | 553 | reserved for specifying the default value of |
546 | 554 | \emph{xkanjiskip} in a unit of the design size, but \pTeX\ |
547 | - did not use these parameters. | |
555 | + did not use these parameters, actually. | |
548 | 556 | |
549 | 557 | Considering this situation of p\TeX, \LuaTeX-ja can use the value of |
550 | 558 | \emph{xkanjiskip} that specified in a metric. If the value of |
551 | - \emph{xkanjiskip} on user side (this is the | |
552 | - \textsf{xkanjiskip} parameter in |\ltjsetparameter|) is | |
559 | + \emph{xkanjiskip} on user side (this is the value of | |
560 | + \textsf{xkanjiskip} parameter of |\ltjsetparameter|) is | |
553 | 561 | |\maxdimen|, then \LuaTeX-ja use the specification from |
554 | 562 | the current used metric as the actual value of |
555 | - \emph{xkanjiskip}. | |
556 | -This description also applies for \emph{kanjiskip}. | |
563 | + \emph{xkanjiskip}. This description also applies for \emph{kanjiskip}. | |
557 | 564 | \end{description} |
558 | 565 | |
559 | 566 | \section{Distinction of characters} |
560 | -\label{sec:distinction_of_characters} | |
561 | -Since \LuaTeX\ can handle Unicode characters natively, it is a major | |
562 | -problem that how we distinguish Japanese characters and alphabetic | |
563 | -characters. For example, the multiplication sign (U+00D7) exists both in | |
564 | -ISO-8859-1 (hence in Latin-1 Supplement in Unicode) and in the basic | |
565 | -Japanese character set JIS~X~0208. It is not desirable that this | |
566 | -character is treated as an alphabetic char, because this symbol is often | |
567 | -used in the sense of `negative' in Japan. | |
567 | +\label{sec:distinction_of_characters} Since \LuaTeX\ can handle Unicode | |
568 | +characters natively, it is a major problem that how we distinguish | |
569 | +Japanese characters and alphabetic characters. For example, the | |
570 | +multiplication sign (U+00D7) exists both in ISO-8859-1 (hence in Latin-1 | |
571 | +Supplement in Unicode) and in the basic Japanese character set | |
572 | +JIS~X~0208. It is not desirable that this character is always treated as | |
573 | +an alphabetic character, because this symbol is often used in the sense | |
574 | +of `negative' in Japan. | |
568 | 575 | |
569 | 576 | \subsection{Character ranges} |
570 | 577 | Before we describe the approach taken is \LuaTeX-ja, we review the |
@@ -573,13 +580,13 @@ approach taken by u\pTeX. u\pTeX\ extends the |\kcatcode| primitive in | ||
573 | 580 | among alphabetic characters~(15), \emph{kanji}~(16), \emph{kana}~(17), |
574 | 581 | \emph{kanji}, \emph{Hangul}~(17), or~\emph{other CJK characters}~(18). |
575 | 582 | The assignment to |\kcatcode| can be done by a Unicode |
576 | -block\footnote{There are some exceptions. For example, U+FF00--FFEF | |
583 | +block.\footnote{There are some exceptions. For example, U+FF00--FFEF | |
577 | 584 | (Halfwidth and Fullwidth Forms) are divided into three blocks in recent |
578 | -u\pTeX.}. | |
585 | +u\pTeX.} | |
579 | 586 | |
580 | 587 | \LuaTeX-ja adopted a different approach. There are many Unicode blocks |
581 | 588 | in Basic Multilingual Plane which are not included in |
582 | - Japanese fonts, it is inconvenient if we treat by a Unicode | |
589 | + Japanese fonts, therefore it is inconvenient if we process by a Unicode | |
583 | 590 | block. Furthermore, JIS~X~0208 are not just union of Unicode |
584 | 591 | blocks; for example, the intersection of JIS~X~0208 and |
585 | 592 | Latin-1 Supplement is shown in |
@@ -607,14 +614,14 @@ u\pTeX.}. | ||
607 | 614 | |
608 | 615 | %%Example... |
609 | 616 | |
610 | -We note that \LuaTeX-ja offers two additional control sequence, | |
617 | +We note that \LuaTeX-ja offers two additional control sequences, | |
611 | 618 | |\ltjjachar| and |\ltjalchar|. They are similar to |\char| |
612 | - primitive, but |\ltjjachar| always yields a Japanese character (if | |
613 | - the argument is more than or equal to 128) and |\ltjalchar| always | |
619 | + primitive, however |\ltjjachar| always yields a Japanese character, provided that | |
620 | + the argument is more than or equal to 128, and |\ltjalchar| always | |
614 | 621 | yields an alphabetic character, regardless of the argument. |
615 | 622 | |
616 | 623 | \subsection{Default setting of ranges} |
617 | -Patches for plain \TeX\ and \LaTeXe of \LuaTeX-ja predefines 8~character | |
624 | +Patches for plain \TeX\ and \LaTeXe\ of \LuaTeX-ja predefine 8~character | |
618 | 625 | ranges, as shown in Table~\ref{tab-chrrng}. Almost of these ranges are |
619 | 626 | just the union of Unicode blocks, and determined from the Adobe-Japan1-6 |
620 | 627 | character collection~\cite{aj16}, and JIS~X~0208. Among these 8~ranges, |
@@ -659,19 +666,19 @@ This is because some 8-bit TFMs have a glyph in this range; for example, | ||
659 | 666 | \subsection{Control sequences producing Unicode characters} |
660 | 667 | \label{ssec-unichar} |
661 | 668 | |
662 | -The \emph{fontspec} package\footnote{Preciously | |
663 | -saying, it is the \emph{xunicode} package, originally a package for | |
664 | -\XeTeX and automatically loaded by the \emph{fontspec} package.} offer | |
665 | -various control sequences that produce Unicode characters. However, they as | |
666 | -it stands cannot work with the default range setting of \LuaTeX-ja. For | |
667 | -example, |\textquotedblleft| is just an abbreviation of | |
668 | -|\char"201C\relax| %" | |
669 | -and the character U+201C (LEFT DOUBLE QUOTATION | |
670 | -MARK) is treated as an Japanese character, because it belongs to the | |
671 | -range~3. | |
672 | -This problem is resolved by using |\ltjalchar| instead of the |\char| primitive. | |
673 | -It is included in an optional package named \texttt{luatexja-\penalty0fontspec.sty}. | |
674 | -Figure~\ref{fig-unitxt} ... | |
669 | +The \emph{fontspec} package\footnote{Preciously saying, it is the | |
670 | +\emph{xunicode} package, originally a package for \XeTeX and | |
671 | +automatically loaded by the \emph{fontspec} package.} offers various | |
672 | +control sequences that produce Unicode characters. However, these | |
673 | +control sequences as it stands cannot work correctly with the default | |
674 | +range setting of \LuaTeX-ja. For example, |\textquotedblleft| is just | |
675 | +an abbreviation of |\char"201C\relax|, and the character U+201C (LEFT %" | |
676 | +DOUBLE QUOTATION MARK) is treated as an Japanese character, because it | |
677 | +belongs to the range~3. This problem is resolved by using |\ltjalchar| | |
678 | +instead of the |\char| primitive. It is included in an optional package | |
679 | +named \texttt{luatexja-\penalty0fontspec.sty}. Figure~\ref{fig-unitxt} | |
680 | +shows several ways o typeset a character , both as a Japanese character | |
681 | +and as as an alphabetic characters. | |
675 | 682 | |
676 | 683 | \begin{figure} |
677 | 684 | \begin{LTXexample} |
@@ -685,7 +692,7 @@ Figure~\ref{fig-unitxt} ... | ||
685 | 692 | \end{figure} |
686 | 693 | |
687 | 694 | The situation looks similar in math formulas, but in fact it differs. |
688 | -Control sequences that represents ordinary symbols defined by the | |
695 | +Each control sequence that represents an ordinary symbol defined by the | |
689 | 696 | \emph{unicode-math} package is just synonym of a character. For example, |
690 | 697 | the meaning of |\otimes| is just the character U+2297 (CIRCLED TIMES), |
691 | 698 | which is included in the range~3. However, it is difficult to define a |
@@ -693,11 +700,11 @@ control sequence like |\ltjalUmathchar| as a counterpart of | ||
693 | 700 | |\Umathchar|, since an input like `|\sum^\ltjalUmathchar ...|' has to be |
694 | 701 | permitted. |
695 | 702 | |
696 | -However, we couldn't include a solution to this problem in time for this | |
697 | -paper, due to a lack of time. We are just testing a solution that we | |
698 | -will explain it below: | |
703 | +However, we couldn't develop a satisfactory solution to this problem in | |
704 | +time for this paper, due to a lack of time. We are just testing a | |
705 | +solution below: | |
699 | 706 | \begin{itemize} |
700 | -\item \LuaTeX-ja has a list of character codes which will be treated as | |
707 | +\item \LuaTeX-ja has a list of character codes which will be always reated as | |
701 | 708 | alphabetic characters in math mode. Considering 8-bit TFMs for |
702 | 709 | math symbols, this list includes natural numbers between |"80| and |
703 | 710 | |"FF| by default. |
@@ -708,7 +715,7 @@ codes of characters which are mentioned in the \emph{unicode-math} | ||
708 | 715 | \end{itemize} |
709 | 716 | |
710 | 717 | |
711 | -We would like to extend treatments described in this section to 8-bit | |
718 | +We would like to extend treatments described in this subsection to 8-bit | |
712 | 719 | font encodings, but we leave it to further development too. |
713 | 720 | |
714 | 721 | \section{Current status of development} |
@@ -799,7 +806,7 @@ An example output is shown in Figure~\ref{fig-bls}. The left half is the | ||
799 | 806 | baseline of Japanese characters is shifted down. On the other |
800 | 807 | hand, the right half is the output when |
801 | 808 | \textsf{yalbaselineshift} is positive, hence the baseline of |
802 | - alphabetic characters is shifted. Figure~\ref{fig-small} | |
809 | + alphabetic characters is shifted down. Figure~\ref{fig-small} | |
803 | 810 | shows an intresting use of these parameters. |
804 | 811 | |
805 | 812 | \end{description} |
@@ -856,12 +863,12 @@ To work this behavior well, a list of all (alphabetic) encodings defined | ||
856 | 863 | \subsection{Classes for Japanese documents} |
857 | 864 | To produce `high-quality' Japanese documents, we need not only that |
858 | 865 | Japanese characters are correctly placed, but also class files for |
859 | -Japanese documents. In \pTeX, there are two major families of classes: | |
866 | +Japanese documents. Two major families of classes are widely used in Japan: | |
860 | 867 | \emph{jclasses} which is distributed with the official p\LaTeXe\ macros, |
861 | 868 | and \emph{jsclasses}. At the present, \LuaTeX-ja |
862 | 869 | simply contains their counterparts: \emph{ltjclasses} and |
863 | -\emph{ltjsclasses}. However, the policy on classess is not determined | |
864 | -now, and we hope to have another family of classes which are useful in | |
870 | +\emph{ltjsclasses}. However, the policy on classes is not determined | |
871 | +now, and we hope to have another family of classes which are useful for | |
865 | 872 | commercial printing. In the author's opinion, \emph{ltjclasses} is |
866 | 873 | better to stay as an example of porting of class files for \pTeX\ to |
867 | 874 | \LuaTeX-ja. |
@@ -885,18 +892,20 @@ the former two packages. | ||
885 | 892 | control sequences producing Unicode characters. |
886 | 893 | |
887 | 894 | \item[The \emph{otf} package] |
888 | -This package is widely used in \pTeX\ for characters which is | |
895 | +This package is widely used in \pTeX\ for typesetting characters which is | |
889 | 896 | not in JIS~X~0208, and for using more than one weight in \emph{mincho} |
890 | 897 | and \emph{gothic} font families. Therefore \LuaTeX-ja supports features |
891 | 898 | in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty} |
892 | 899 | manually. Note that characters by |\UTF{xxxx}| and |
893 | 900 | |\CID{xxxx}| are not appended to the current list as a |
894 | - \emph{glyph\_node}, so they are not affected by callbacks by | |
895 | - the \emph{luaotfload} package. We have another remark; |\CID| | |
896 | - does not work with TrueType fonts. | |
901 | + \emph{glyph\_node}, to avoid from callbacks by the | |
902 | + \emph{luaotfload} package. We have another remark; |\CID| | |
903 | + does not work with TrueType fonts, since |\CID| use the | |
904 | + conversion table between CID and the glyph order of the | |
905 | + current Japanese font. | |
897 | 906 | |
898 | 907 | \item[The \emph{listings} package] |
899 | -It is known for users of \pTeX that there is a patch |jlisting.sty| for | |
908 | +It is known for users of \pTeX\ that there is a patch |jlisting.sty| for | |
900 | 909 | the \emph{listings} package, to use Japanese characters in |
901 | 910 | the |lstlisting| environment. Generally speaking, it also can |
902 | 911 | be used in \LuaTeX-ja. However, it seems to be that a |
@@ -905,11 +914,11 @@ It is known for users of \pTeX that there is a patch |jlisting.sty| for | ||
905 | 914 | use the \emph{showexpl} package. |
906 | 915 | |
907 | 916 | There is another way to use characters above 256 with the |
908 | - \emph{listings} package (described in\cite{apl}), however, | |
917 | + \emph{listings} package (described in\cite{apl}). However, | |
909 | 918 | this method is not suitable for Japanese, since the number of |
910 | 919 | Japanese characters is very large. We hope that the |
911 | - \emph{listings} package will be able to cope with all characters above | |
912 | - 256 in the future. | |
920 | + \emph{listings} package will be able to handle all characters above | |
921 | + 256 without any patch, in the future. | |
913 | 922 | |
914 | 923 | |
915 | 924 | \end{description} |
@@ -917,10 +926,11 @@ There is another way to use characters above 256 with the | ||
917 | 926 | |
918 | 927 | |
919 | 928 | \section{Implementation} |
929 | +\label{sec:implementation} | |
920 | 930 | \subsection{Handling of Japanese fonts} |
921 | 931 | In \pTeX, there are three slots for maintaining current fonts, namely |
922 | -|\font| for alphabetic fonts, |\jfont| for Japanese font (in horizontal | |
923 | -direction) and |\tfont| for Japanese font (in vertical direction). With | |
932 | +|\font| for alphabetic fonts, |\jfont| for Japanese fonts (in horizontal | |
933 | +direction) and |\tfont| for Japanese fonts (in vertical direction). With | |
924 | 934 | these slots, we can manage the current font for alphabetic characters |
925 | 935 | and that for Japanese characters separately in \pTeX. However, \LuaTeX\ |
926 | 936 | has only one slot for maintaining the current font, as \TeX82. This |
@@ -947,7 +957,7 @@ they cannot be an argument of |\the|, |\fontname|, nor |\textfont|. | ||
947 | 957 | |
948 | 958 | Callbacks by the \emph{luaotfload} package, e.g.,~replacement of glyphs |
949 | 959 | according to font features, are executed just after `Examination of |
950 | -Stack Level' (see Subsection~\ref{ssec-over}). Note that calculation of | |
960 | +Stack Level' (see Subsections \ref{ssec-over}~and~\ref{ssec-stack}). Note that calculation of | |
951 | 961 | character classes for each Japanese character is done \emph{after} the |
952 | 962 | these callbacks for now. |
953 | 963 |
@@ -955,10 +965,10 @@ these callbacks for now. | ||
955 | 965 | \label{ssec-stack} |
956 | 966 | |
957 | 967 | As we noted in Subsection~\ref{ssec-csname}, parameters that the values |
958 | -at the end of a horizontal box or that of a paragraph are effective in | |
968 | +at the end of a horizontal box or that of a paragraph are valid in | |
959 | 969 | whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented |
960 | 970 | by internal integers or registers of other types in \TeX. We explain it |
961 | -in this section. | |
971 | +in this subsection. | |
962 | 972 | |
963 | 973 | \begin{figure} |
964 | 974 | \begin{lstlisting} |
@@ -1039,7 +1049,7 @@ needed. In the context of \pTeX, this process was performed using virtual fonts. | ||
1039 | 1049 | On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph |
1040 | 1050 | into a horizontal box. There are two main reasons why we adopted this |
1041 | 1051 | method; one is that we feared Lua codes for coexisting with callbacks by |
1042 | -|luaotfload| package would be large if we use virtual fonts, and the | |
1052 | +the |luaotfload| package would be large if we use virtual fonts, and the | |
1043 | 1053 | other is to cope with shifting of the baseline of characters at the |
1044 | 1054 | same time. |
1045 | 1055 |
@@ -1093,29 +1103,32 @@ same time. | ||
1093 | 1103 | \end{figure} |
1094 | 1104 | |
1095 | 1105 | Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is |
1096 | -the imaginary body which is specified in the metric, and a vertical | |
1106 | +the imaginary body specified in the metric, and a vertical | |
1097 | 1107 | rectangle is the imaginary body of a real glyph. First, the real glyph |
1098 | 1108 | is aligned with respect to the width of $M$. In the figure, the real |
1099 | 1109 | glyph is aligned `middle'; this setting is useful for the full-width |
1100 | -middle dot `・'. We have other settings, namely, `left' and `right'. | |
1110 | +middle dot `・'. We have other settings, `left' and `right'. | |
1101 | 1111 | After that, it is shifted according to the value of |left| and |down|, |
1102 | -which are specified in the metric. The final position of the real glyph | |
1112 | +which are specified in the metric, too. The final position of the real glyph | |
1103 | 1113 | is shown by the gray rectangle~$R$. If the amount of shifting the baseline is |
1104 | 1114 | not zero, $M$ (and hence the real glyph) is shifted by that amount. |
1105 | 1115 | |
1106 | -We would like to remark briefly about the vertical position of a glyph. | |
1107 | -A JFM (or the metric used in \LuaTeX-ja) and the real font used for it | |
1108 | -may have different height or depth. In that case, it may look better if | |
1109 | -the real glyph is shifted vertically to match the height-depth ratio | |
1110 | -specified in the metric. This situation is carefully studied by | |
1116 | +We would like to remark briefly on the vertical position of a real | |
1117 | +glyph. A JFM (or a metric used in \LuaTeX-ja) and a real font used for | |
1118 | +it may have different height or depth. In that case, it may look better | |
1119 | +if the real glyph is shifted vertically to match the height-depth ratio | |
1120 | +specified in the metric, while any vertical adjustment except the | |
1121 | +adjustment by the |down| value does not performed in the present | |
1122 | +implementation of \LuaTeX-ja . This situation is carefully studied by | |
1111 | 1123 | Otobe~\cite{min10}. Here the policy on this problem is not determined |
1112 | -now, however we would like to offer several solutions in future development. | |
1124 | +now, however we would like to offer several solutions in future | |
1125 | +development. | |
1113 | 1126 | |
1114 | 1127 | \section{Conclusion} |
1115 | 1128 | We have discussed about our \LuaTeX-ja package, which is much affected |
1116 | 1129 | by \pTeX. For now, it can be used for experimental use, however there |
1117 | 1130 | are much refinements which are needed for regular use. The author hopes |
1118 | -that this paper and this project contribute the typesetting Japanese, | |
1131 | +that this paper and \LuaTeX-ja project contribute the typesetting Japanese, | |
1119 | 1132 | and possibly other Asian languages, under \LuaTeX. |
1120 | 1133 | |
1121 | 1134 | \section*{Acknowledgements} |