• R/O
  • HTTP
  • SSH
  • HTTPS

luatexja: コミット

ソースコードの管理場所


コミットメタ情報

リビジョンc85b9aa3a044213f560857e22da5b95f591a6f33 (tree)
日時2011-11-21 20:01:35
作者Hironori Kitagawa <h_kitagawa2001@yaho...>
コミッターHironori Kitagawa

ログメッセージ

Updated the draft for post-proceedings.

変更サマリ

差分

--- a/doc/ajt-devel-ltja.tex
+++ b/doc/ajt-devel-ltja.tex
@@ -74,21 +74,22 @@ internal processing methods of \LuaTeX-ja.
7474 To typeset Japanese documents with \TeX, ASCII \pTeX~\cite{ptex} has
7575 been widely used in Japan. There are other methods---for example, using
7676 Omega and OTP~\cite{omega}, or with the CJK package---to do so, however,
77-these alternative methods did not become a majority. The author thinks
77+these alternative methods did not become majority. The author thinks
7878 that this is because \pTeX\ enables us to produce high-quality documents
7979 (e.g.,~supporting vertical typesetting), and the appearance of \pTeX\ is
8080 earlier than that of alternatives described above.
8181
82-However, \pTeX\ has been left behind from the extensions of \TeX\
83-such as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In
84-recent years, the situation has become better, because of development
85-of |ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}),
82+However, \pTeX\ has been left behind from the extensions of \TeX\ such
83+as \eTeX\ and \pdfTeX, and the diffusion of UTF-8 encoding. In recent
84+years, the situation has become better, by development of
85+|ptexenc|~\cite{ptexenc} by Nobuyuki Tsuchimura (\hbox{土村展之}),
8686 $\varepsilon$-\pTeX~\cite{eptex} by the author,~and u\pTeX~\cite{uptex}
87-by Takuji Tanaka (田中琢爾). However, continuing this approach, namely, to develop
88-an engine extension localized for Japanese, is not wise. This approach
89-needs lots of work for \emph{each} engine, and since \LuaTeX\ has an ability
90-to hook \TeX's internal process by using Lua callbacks, the necessity of
91-an engine extension is getting smaller.
87+by Takuji Tanaka (田中琢爾). However, continuing this approach, namely,
88+to develop an engine extension localized for Japanese, is not wise. This
89+approach needs lots of work for \emph{each} engine. In addition, if we
90+use \LuaTeX, the necessity of an engine extension is getting smaller
91+because \LuaTeX\ has an ability to hook \TeX's internal process by using
92+Lua callbacks.
9293
9394
9495 There were several experimental attempts to typeset
@@ -111,18 +112,18 @@ these situations.
111112
112113 \subsection{Development policy of \LuaTeX-ja}
113114 \label{ssec-pol}
114-The first aim of \LuaTeX-ja project is to implement features (from the
115-`primitive' level) of \pTeX\ as macros under \LuaTeX, so \LuaTeX-ja is
116-much affected by \pTeX. However, as development proceeds, some
117-technical/conceptual difficulties are arisen. Hence we changed the aim
115+The first aim of \LuaTeX-ja project was to implement features (from the
116+`primitive' level) of \pTeX\ as macros under \LuaTeX, therefore \LuaTeX-ja is
117+much affected by \pTeX. However, as development proceeded, some
118+technical/conceptual difficulties arose. Hence we changed the aim
118119 of the project as follows:
119120 \begin{itemize}
120121 \item\emph{\LuaTeX-ja offers at least the same flexibility of
121122 typesetting that p\TeX\ has.}
122123
123- We think that the ability of producing outputs conformed to
124+ We are not satisfied with the ability of producing outputs conformed to
124125 JIS~X~4051~\cite{jisx4051}, the Japanese Industrial Standard for
125- typesetting, or to a technical note~\cite{w3c} by W3C is not enough;
126+ typesetting, or to a technical note~\cite{w3c} by W3C;
126127 if one wants to produce very incoherent outputs for some reason, it
127128 should be possible.
128129 In this point, previous attempts of Japanese typesetting with \LuaTeX\
@@ -144,59 +145,66 @@ In this point, previous attempts of Japanese typesetting with \LuaTeX\
144145 \subsection{Overview of the processes}
145146 \label{ssec-over}
146147 We describe an outline of \LuaTeX-ja's process in order.
148+
147149 \begin{itemize}
148150 \item In the |process_input_buffer| callback: treatment of breaking
149151 lines after a Japanese character (in Subsection~\ref{ssec-line}).
150152
151153 \item In the |hyphenate| callback: font replacement.
152154
153-\LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the list. If
155+\LuaTeX-ja looks into for each \textit{glyph\_node}~$p$ in the horizontal list. If
154156 the character represented by $p$ is considered as a Japanese
155- character, the font used in $p$ is replaced by the value of
157+ character, the font used at $p$ is replaced by the value of
156158 |\ltj@curjfnt|, an attribute for `the current Japanese font'
157159 at~$p$.
158160
159-Furthermore the subtype of $p$ is subtracted by 1 to suppress
160- hyphenation around it by \LuaTeX, because later processes of
161+Furthermore, the subtype of $p$ is subtracted by 1 to suppress
162+ hyphenation around $p$ by \LuaTeX, because later processes of
161163 \LuaTeX-ja take care of all things about Japanese characters.
162164
163165 \item In |pre_linebreak_filter| and |hpack_filter| callbacks:
164166
165167 \begin{enumerate}
166168 \item \LuaTeX-ja has its own stack system, and the current horizontal
167- list is traversed in this stage to determine what is the level of
168- \LuaTeX-ja's internal stack at the end of the list (in
169- Subsection~\ref{ssec-stack}).
169+ list is traversed in this stage to determine what the level of
170+ \LuaTeX-ja's internal stack at the end of the list is. We will
171+ discuss it in Subsection~\ref{ssec-stack}.
170172
171173 \item In this stage, \LuaTeX-ja inserts glues/kerns for Japanese
172- typesetting in the list. This is the core of \LuaTeX-ja (in
173- Subsection~\ref{ssec-jglue}).
174+ typesetting in the list. This is the core routine of \LuaTeX-ja.
175+ We will discuss it in Subsections
176+ \ref{ssec-jglue}~and~\ref{ssec-jspec} .
174177
175178 \item To make a match between a metric and a real font, sometimes
176- adjustument of the position of (Japanese) glyphs are performed
177- (Subsection~\ref{ssec-width}).
179+ adjustument of the position of (Japanese) glyphs are performed.
180+ We will discuss it in Subsection~\ref{ssec-width}.
178181 \end{enumerate}
179-\item In the |mlist_to_hlist| callback: replacement of Japanese characters in math formulas.
180-This stage is similar to adjustument of the position of glyphs (see
181- above), so we omit it from this paper.
182+\item In the |mlist_to_hlist| callback: treatment of Japanese characters
183+ in math formulas. This stage is similar to adjustment of the
184+ position of glyphs (see above), so we omit to describe this stage
185+ from this paper.
182186 \end{itemize}
183187
188+In this paper, a \emph{alphabetic character} means a non-Japanese
189+character. Similarly, we use the word an \emph{alphabetic font} as the
190+counterpart of a jJpanese font.
191+
184192 \subsection{Contents of this paper}
185193 Here we describe the contents of the rest of this paper briefly. In
186-Section~\ref{sec:differences_with_ptex},
187-we describe major differences between \pTeX\ and \LuaTeX-ja.
188-The next section, Section~\ref{sec:distinction_of_characters},
189-is concentrated on a problem `how we
190-distinguish between Japanese characters and alphabetic characters'. In
191-Section~\ref{sec:current_status}, we show rest of features of \LuaTeX-ja package, and
192-current status of the package. Finally, in Section~\ref{sec:implementation}, we describe some
193-internal routines of \LuaTeX-ja.
194+Section~\ref{sec:differences_with_ptex}, we describe major differences
195+between \pTeX\ and \LuaTeX-ja. The next section,
196+Section~\ref{sec:distinction_of_characters}, is concentrated on a
197+problem how we distinguish between Japanese characters and alphabetic
198+characters. In Section~\ref{sec:current_status}, we show current
199+development status of the package. Finally, in
200+Section~\ref{sec:implementation}, we describe some internal routines of
201+\LuaTeX-ja.
194202
195203 \subsection{General information of the project}
196204 This \LuaTeX-ja project is hosted by SourceForge.jp. The official wiki
197205 is located on
198206 \url{http://sourceforge.jp/projects/luatex-ja/wiki/}. There is
199-no stable version on October 15, 2011, however a set of developer sources can be
207+no stable version on October 22, 2011, however a set of developer sources can be
200208 obtained from the git repository. Members of the project team are as follows
201209 (in random order): Hironori Kitagawa, Kazuki Maeda, Takayuki Yato,
202210 Yusuke Kuroki, Noriyuki Abe, Munehiro Yamamoto, Tomoaki Honda,
@@ -212,7 +220,7 @@ overview of \pTeX, please see Okumura~\cite{ptexjp}.
212220
213221 \subsection{Names of control sequences}
214222 \label{ssec-csname} Because \pTeX\ is an engine modification of Knuth's
215-original \TeX82 engine, some primitives added by it take a form that is
223+original \TeX82 engine, some of the additional primitives take a form that is
216224 very difficult to be simulated by a macro. For example, an additional
217225 primitive |\prebreakpenalty|$\langle\hbox{\it
218226 char\_code}\rangle$|[=]|$\langle\hbox{\it penalty}\rangle$ in \pTeX\
@@ -221,21 +229,19 @@ $\langle\hbox{\it char\_code}\rangle$ to $\langle\hbox{\it
221229 penalty}\rangle$, and this form |\prebreakpenalty|$\langle\hbox{\it
222230 char\_code}\rangle$ can be also used for retrieving the value.
223231
224-Moreover, there are some parameters which values of them at the end of a
225-horizontal box or that of a paragraph are effective in whole box or
226-paragraph. These parameters were implemented as additional internal
227-parameters in \pTeX. However, the implementation of these parameters in
228-\LuaTeX-ja is not so easy; we will discuss it in
229-Subsection~\ref{ssec-stack}.
232+Moreover, there are some internal parameters of \pTeX\ which values of them at the end of a
233+horizontal box or that of a paragraph are valid in whole box or
234+paragraph. However, the implementation of these parameters in
235+\LuaTeX-ja is not so easy; we will discuss it in Subsection~\ref{ssec-stack}.
230236
231-From above two~problems we discussed above, the assignment and retrieval
237+From above two~problems discussed above, the assignment and retrieval
232238 of most parameters in \LuaTeX-ja are summarized into the following
233239 three~control sequences:
234240 \begin{itemize}
235241 \item |\ltjsetparameter{|$\langle\hbox{\it
236242 name}\rangle$|=|$\langle\hbox{\it value}\rangle$|,...}|: for local
237243 assignment.
238-\item |\ltjglobalsetparameter|: for global assignment. These two control
244+\item |\ltjglobalsetparameter|: for global assignment. Note that these two control
239245 sequences obey the value of |\globaldefs| primitive.
240246 \item |\ltjgetparameter{|$\langle\hbox{\it
241247 name}\rangle$|}[{|$\langle\hbox{\it optional
@@ -272,7 +278,7 @@ letter `あ' will be treated as an alphabetic character by
272278 \LuaTeX-ja. Then, it is natural to have a space between `あ' and `y' in
273279 the output, where the actual output in the figure does not so. This is
274280 because `あ' is considered a Japanese character by \LuaTeX-ja,
275-when \LuaTeX-ja does a decision whether U+FFFFF will be added to the
281+when \LuaTeX-ja does the decision whether U+FFFFF will be added to the
276282 input line~2.
277283
278284 \begin{figure}
@@ -295,7 +301,7 @@ JFMs are essentially same, and only differ in their names. For example,
295301 |min10.tfm| and |goth10.tfm|, which are JFMs shipped with \pTeX\ for
296302 seriffed \emph{mincho} family and sans-seriffed \emph{gothic} family,
297303 differ their |FAMILY| and |FACE| only. Moreover, |jis.tfm| and
298-|jisg.tfm|, which consists a parts of \emph{jis} font metric, which is
304+|jisg.tfm|, which is included in the \emph{jis} font metric, which is
299305 used in \emph{jsclasses}~\cite{jsclasses} by Haruhiko Okumura (奥村晴彦),
300306 are totally same as binary files. Considering this situation, we
301307 decided to separate `real' fonts and metrics used for them in
@@ -305,14 +311,14 @@ remarks:
305311 \begin{itemize}
306312 \item A control sequence |\jfont| must be used for Japanese fonts, instead of |\font|.
307313 \item \LuaTeX-ja automatically loads the \emph{luaotfload} package, so
308- |file:| and |name:| prefixes, and various font features can be
309- used as the line~1 in Figure~\ref{fig-jfdef}.
314+ \hbox{\tt file:} and \hbox{\tt name:} prefixes, and various font features can be
315+ used as the first line in Figure~\ref{fig-jfdef}.
310316 \item The |jfm| key specifies the metric for the font. In
311317 Figure~\ref{fig-jfdef}, both fonts will use a metric stored in a
312318 Lua script named |jfm-ujis.lua|. This metric is the standard
313319 metric in \LuaTeX-ja, and is based on JFMs used in the \emph{otf}
314320 package~\cite{otf}.
315-\item The |psft:| prefix can be used to specify name-only, non-embedded
321+\item The \hbox{psft:} prefix can be used to specify name-only, non-embedded
316322 fonts. When one display a pdf with these fonts, actual fonts which
317323 will be used for them depend on a pdf reader.
318324 \end{itemize}
@@ -326,7 +332,7 @@ metrics by default; |jfm-ujis.lua|, |jfm-jis.lua| based on the
326332 \emph{jis} font metric, and |jfm-min.lua| based on old |min10.tfm|.
327333
328334 Note that |-kern| in features
329-is important, because kerning information from real font itself will
335+is important, because kerning information from a real font itself will
330336 clash with glue/kern informations from the metric.
331337
332338 \begin{figure}
@@ -351,7 +357,7 @@ process will be done when a horizontal box or a paragraph is ended, so
351357
352358 The situation for Japanese characters is more complicated.
353359 Glues (and kerns) which are needed for Japanese
354-typesetting will be divided into the following three categories:
360+typesetting are divided into the following three categories:
355361 \begin{itemize}
356362 \item Glue (or kern) from the metric of Japanese fonts (\emph{JFM glue},
357363 for short).
@@ -385,6 +391,8 @@ this specification are to behave like alphabetic characters in \LuaTeX\
385391 for \LuaTeX-ja's process.
386392
387393 \subsection{Insertion of glues/kerns for Japanese typesetting: specification}
394+\label{ssec-jspec}
395+
388396 \begin{table}
389397 \caption{Examples of differences between \pTeX\ and \LuaTeX-ja.}
390398 \label{tab-jfmglue}
@@ -422,16 +430,16 @@ Now we will take a look inside the insertion process itself, and describe 4~poin
422430 \begin{description}
423431 \item[Ignored Nodes]
424432 As noted in the previous subsection, the insertion process in \pTeX\ can
425- be interrupted by saying |{}| or anything else\footnote{This
433+ be interrupted by saying |{}| or anything else.\footnote{This
426434 is why some tricks like \texttt{ちょ\char`\{\char`\}っと} for
427- \texttt{min10.tfm} and other `old' JFMs work.}. This leads
428- the second row in Table~\ref{tab-jfmglue}, or
429- Figure~\ref{fig-ptexjfm}. `The process is interrupted' means
430- that \pTeX\ does not think the letter `】\inhibitglue' is
431- followed by `\inhibitglue【', hence two half-width glues are
432- inserted between between `】\inhibitglue' and `\inhibitglue【',
433- where one is from `】\inhibitglue' and another is from
434- `\inhibitglue【'.
435+ \texttt{min10.tfm} and other `old' JFMs work.} This leads the
436+ second row in Table~\ref{tab-jfmglue}, or
437+ Figure~\ref{fig-ptexjfm}. Here `the process is interrupted'
438+ means that \pTeX\ does not think the letter `】\inhibitglue'
439+ is followed by `\inhibitglue【', hence two half-width glues
440+ are inserted between `】\inhibitglue' and `\inhibitglue【',
441+ where the left one is from `】\inhibitglue' and the right one
442+ is from `\inhibitglue【'.
435443
436444 On the other hand, in \LuaTeX-ja, the process is done inside
437445 |hpack_filter| and |pre_linebreak_filter| callbacks. Hence,
@@ -444,14 +452,14 @@ As noted in the previous subsection, the insertion process in \pTeX\ can
444452 \emph{penalty\_node}---, as shown in (4).
445453
446454
447-By the way, around a \emph{glyph\_node} $p$ there may be some nld odes
455+By the way, around a \emph{glyph\_node} $p$ there may be some nodes
448456 attached to $p$. These are an accent and kerns for
449- positioning it, and a kern from the italic
457+ moving it to the right place, and a kern from the italic
450458 correction\footnote{\TeX82 (and \LuaTeX) does not distinguish
451459 between explicit kern and a kern for italic correction. To
452- distinguish them, an additional subtype for kern is introduced
460+ distinguish them, an additional subtype for a kern is introduced
453461 in \pTeX. On the other hand, \LuaTeX-ja uses an additional attribute and
454- redefines \texttt{\char`\\/}.} for $p$. It is natural that
462+ redefines \texttt{\char`\\/} to set this attribute.} for $p$. It is natural that
455463 these attachments should be ignored inside the process. Hence
456464 \LuaTeX-ja takes this approach, as the latest version of
457465 \pTeX\ (p3.2). This explains (2) in the figure.
@@ -485,7 +493,7 @@ However this seems to be unnatural, since two Japanese fonts in the
485493 \mc 明朝)\gt (ゴシック
486494 \end{quote}
487495 One might have the situation that this default behavior is not
488- suitable. \LuaTeX-ja offers a way to cope with this case, but
496+ suitable. \LuaTeX-ja offers a way to handle this situation, but
489497 we leave it to the manual~\cite{man}.
490498
491499 \item[Fonts with Different Metrics]
@@ -503,9 +511,9 @@ As the previous paragraph, this input yields the following, by \pTeX:
503511 \mc 漢)\hbox{}\gt (漢)\hbox{}\large (大
504512 \end{quote}
505513 We thought that amounts of spaces between parentheses in above output
506- are too much. So we changed the default behavior of
507- \LuaTeX-ja so that the amount of a glue between two Japanese
508- characters with different metrics is the average of a glue
514+ are too much. Hence we changed the default behavior of
515+ \LuaTeX-ja, so that the amount of a glue between two Japanese
516+ characters with different metrics is the \emph{average} of a glue
509517 from the left character and that from the right
510518 character. For example, Figure~\ref{fig-diffmet} shows the
511519 output from above input. The width of glue indicated `(1)' is
@@ -538,33 +546,32 @@ We thought that amounts of spaces between parentheses in above output
538546
539547 \item[\emph{kanjiskip} and \emph{xkanjiskip}]
540548 In \pTeX, the value of \emph{xkanjiskip} is controlled by a skip named
541- |\xkanjiskip|. A defect of this implementation is that the
542- value of \emph{xkanjiskip} is not connected with the size of
543- the currnt Japanese font. It seems that |EXTRASPACE|,
549+ |\xkanjiskip|. A well-known defect of this implementation is
550+ that the value of \emph{xkanjiskip} is not connected with the
551+ size of the currnt Japanese font. It seems that |EXTRASPACE|,
544552 |EXTRASTRETCH|, |EXTRASHRINK| parameters in a JFM are
545553 reserved for specifying the default value of
546554 \emph{xkanjiskip} in a unit of the design size, but \pTeX\
547- did not use these parameters.
555+ did not use these parameters, actually.
548556
549557 Considering this situation of p\TeX, \LuaTeX-ja can use the value of
550558 \emph{xkanjiskip} that specified in a metric. If the value of
551- \emph{xkanjiskip} on user side (this is the
552- \textsf{xkanjiskip} parameter in |\ltjsetparameter|) is
559+ \emph{xkanjiskip} on user side (this is the value of
560+ \textsf{xkanjiskip} parameter of |\ltjsetparameter|) is
553561 |\maxdimen|, then \LuaTeX-ja use the specification from
554562 the current used metric as the actual value of
555- \emph{xkanjiskip}.
556-This description also applies for \emph{kanjiskip}.
563+ \emph{xkanjiskip}. This description also applies for \emph{kanjiskip}.
557564 \end{description}
558565
559566 \section{Distinction of characters}
560-\label{sec:distinction_of_characters}
561-Since \LuaTeX\ can handle Unicode characters natively, it is a major
562-problem that how we distinguish Japanese characters and alphabetic
563-characters. For example, the multiplication sign (U+00D7) exists both in
564-ISO-8859-1 (hence in Latin-1 Supplement in Unicode) and in the basic
565-Japanese character set JIS~X~0208. It is not desirable that this
566-character is treated as an alphabetic char, because this symbol is often
567-used in the sense of `negative' in Japan.
567+\label{sec:distinction_of_characters} Since \LuaTeX\ can handle Unicode
568+characters natively, it is a major problem that how we distinguish
569+Japanese characters and alphabetic characters. For example, the
570+multiplication sign (U+00D7) exists both in ISO-8859-1 (hence in Latin-1
571+Supplement in Unicode) and in the basic Japanese character set
572+JIS~X~0208. It is not desirable that this character is always treated as
573+an alphabetic character, because this symbol is often used in the sense
574+of `negative' in Japan.
568575
569576 \subsection{Character ranges}
570577 Before we describe the approach taken is \LuaTeX-ja, we review the
@@ -573,13 +580,13 @@ approach taken by u\pTeX. u\pTeX\ extends the |\kcatcode| primitive in
573580 among alphabetic characters~(15), \emph{kanji}~(16), \emph{kana}~(17),
574581 \emph{kanji}, \emph{Hangul}~(17), or~\emph{other CJK characters}~(18).
575582 The assignment to |\kcatcode| can be done by a Unicode
576-block\footnote{There are some exceptions. For example, U+FF00--FFEF
583+block.\footnote{There are some exceptions. For example, U+FF00--FFEF
577584 (Halfwidth and Fullwidth Forms) are divided into three blocks in recent
578-u\pTeX.}.
585+u\pTeX.}
579586
580587 \LuaTeX-ja adopted a different approach. There are many Unicode blocks
581588 in Basic Multilingual Plane which are not included in
582- Japanese fonts, it is inconvenient if we treat by a Unicode
589+ Japanese fonts, therefore it is inconvenient if we process by a Unicode
583590 block. Furthermore, JIS~X~0208 are not just union of Unicode
584591 blocks; for example, the intersection of JIS~X~0208 and
585592 Latin-1 Supplement is shown in
@@ -607,14 +614,14 @@ u\pTeX.}.
607614
608615 %%Example...
609616
610-We note that \LuaTeX-ja offers two additional control sequence,
617+We note that \LuaTeX-ja offers two additional control sequences,
611618 |\ltjjachar| and |\ltjalchar|. They are similar to |\char|
612- primitive, but |\ltjjachar| always yields a Japanese character (if
613- the argument is more than or equal to 128) and |\ltjalchar| always
619+ primitive, however |\ltjjachar| always yields a Japanese character, provided that
620+ the argument is more than or equal to 128, and |\ltjalchar| always
614621 yields an alphabetic character, regardless of the argument.
615622
616623 \subsection{Default setting of ranges}
617-Patches for plain \TeX\ and \LaTeXe of \LuaTeX-ja predefines 8~character
624+Patches for plain \TeX\ and \LaTeXe\ of \LuaTeX-ja predefine 8~character
618625 ranges, as shown in Table~\ref{tab-chrrng}. Almost of these ranges are
619626 just the union of Unicode blocks, and determined from the Adobe-Japan1-6
620627 character collection~\cite{aj16}, and JIS~X~0208. Among these 8~ranges,
@@ -659,19 +666,19 @@ This is because some 8-bit TFMs have a glyph in this range; for example,
659666 \subsection{Control sequences producing Unicode characters}
660667 \label{ssec-unichar}
661668
662-The \emph{fontspec} package\footnote{Preciously
663-saying, it is the \emph{xunicode} package, originally a package for
664-\XeTeX and automatically loaded by the \emph{fontspec} package.} offer
665-various control sequences that produce Unicode characters. However, they as
666-it stands cannot work with the default range setting of \LuaTeX-ja. For
667-example, |\textquotedblleft| is just an abbreviation of
668-|\char"201C\relax| %"
669-and the character U+201C (LEFT DOUBLE QUOTATION
670-MARK) is treated as an Japanese character, because it belongs to the
671-range~3.
672-This problem is resolved by using |\ltjalchar| instead of the |\char| primitive.
673-It is included in an optional package named \texttt{luatexja-\penalty0fontspec.sty}.
674-Figure~\ref{fig-unitxt} ...
669+The \emph{fontspec} package\footnote{Preciously saying, it is the
670+\emph{xunicode} package, originally a package for \XeTeX and
671+automatically loaded by the \emph{fontspec} package.} offers various
672+control sequences that produce Unicode characters. However, these
673+control sequences as it stands cannot work correctly with the default
674+range setting of \LuaTeX-ja. For example, |\textquotedblleft| is just
675+an abbreviation of |\char"201C\relax|, and the character U+201C (LEFT %"
676+DOUBLE QUOTATION MARK) is treated as an Japanese character, because it
677+belongs to the range~3. This problem is resolved by using |\ltjalchar|
678+instead of the |\char| primitive. It is included in an optional package
679+named \texttt{luatexja-\penalty0fontspec.sty}. Figure~\ref{fig-unitxt}
680+shows several ways o typeset a character , both as a Japanese character
681+and as as an alphabetic characters.
675682
676683 \begin{figure}
677684 \begin{LTXexample}
@@ -685,7 +692,7 @@ Figure~\ref{fig-unitxt} ...
685692 \end{figure}
686693
687694 The situation looks similar in math formulas, but in fact it differs.
688-Control sequences that represents ordinary symbols defined by the
695+Each control sequence that represents an ordinary symbol defined by the
689696 \emph{unicode-math} package is just synonym of a character. For example,
690697 the meaning of |\otimes| is just the character U+2297 (CIRCLED TIMES),
691698 which is included in the range~3. However, it is difficult to define a
@@ -693,11 +700,11 @@ control sequence like |\ltjalUmathchar| as a counterpart of
693700 |\Umathchar|, since an input like `|\sum^\ltjalUmathchar ...|' has to be
694701 permitted.
695702
696-However, we couldn't include a solution to this problem in time for this
697-paper, due to a lack of time. We are just testing a solution that we
698-will explain it below:
703+However, we couldn't develop a satisfactory solution to this problem in
704+time for this paper, due to a lack of time. We are just testing a
705+solution below:
699706 \begin{itemize}
700-\item \LuaTeX-ja has a list of character codes which will be treated as
707+\item \LuaTeX-ja has a list of character codes which will be always reated as
701708 alphabetic characters in math mode. Considering 8-bit TFMs for
702709 math symbols, this list includes natural numbers between |"80| and
703710 |"FF| by default.
@@ -708,7 +715,7 @@ codes of characters which are mentioned in the \emph{unicode-math}
708715 \end{itemize}
709716
710717
711-We would like to extend treatments described in this section to 8-bit
718+We would like to extend treatments described in this subsection to 8-bit
712719 font encodings, but we leave it to further development too.
713720
714721 \section{Current status of development}
@@ -799,7 +806,7 @@ An example output is shown in Figure~\ref{fig-bls}. The left half is the
799806 baseline of Japanese characters is shifted down. On the other
800807 hand, the right half is the output when
801808 \textsf{yalbaselineshift} is positive, hence the baseline of
802- alphabetic characters is shifted. Figure~\ref{fig-small}
809+ alphabetic characters is shifted down. Figure~\ref{fig-small}
803810 shows an intresting use of these parameters.
804811
805812 \end{description}
@@ -856,12 +863,12 @@ To work this behavior well, a list of all (alphabetic) encodings defined
856863 \subsection{Classes for Japanese documents}
857864 To produce `high-quality' Japanese documents, we need not only that
858865 Japanese characters are correctly placed, but also class files for
859-Japanese documents. In \pTeX, there are two major families of classes:
866+Japanese documents. Two major families of classes are widely used in Japan:
860867 \emph{jclasses} which is distributed with the official p\LaTeXe\ macros,
861868 and \emph{jsclasses}. At the present, \LuaTeX-ja
862869 simply contains their counterparts: \emph{ltjclasses} and
863-\emph{ltjsclasses}. However, the policy on classess is not determined
864-now, and we hope to have another family of classes which are useful in
870+\emph{ltjsclasses}. However, the policy on classes is not determined
871+now, and we hope to have another family of classes which are useful for
865872 commercial printing. In the author's opinion, \emph{ltjclasses} is
866873 better to stay as an example of porting of class files for \pTeX\ to
867874 \LuaTeX-ja.
@@ -885,18 +892,20 @@ the former two packages.
885892 control sequences producing Unicode characters.
886893
887894 \item[The \emph{otf} package]
888-This package is widely used in \pTeX\ for characters which is
895+This package is widely used in \pTeX\ for typesetting characters which is
889896 not in JIS~X~0208, and for using more than one weight in \emph{mincho}
890897 and \emph{gothic} font families. Therefore \LuaTeX-ja supports features
891898 in the \emph{otf} package, by loading \texttt{luatexja-\penalty0otf.sty}
892899 manually. Note that characters by |\UTF{xxxx}| and
893900 |\CID{xxxx}| are not appended to the current list as a
894- \emph{glyph\_node}, so they are not affected by callbacks by
895- the \emph{luaotfload} package. We have another remark; |\CID|
896- does not work with TrueType fonts.
901+ \emph{glyph\_node}, to avoid from callbacks by the
902+ \emph{luaotfload} package. We have another remark; |\CID|
903+ does not work with TrueType fonts, since |\CID| use the
904+ conversion table between CID and the glyph order of the
905+ current Japanese font.
897906
898907 \item[The \emph{listings} package]
899-It is known for users of \pTeX that there is a patch |jlisting.sty| for
908+It is known for users of \pTeX\ that there is a patch |jlisting.sty| for
900909 the \emph{listings} package, to use Japanese characters in
901910 the |lstlisting| environment. Generally speaking, it also can
902911 be used in \LuaTeX-ja. However, it seems to be that a
@@ -905,11 +914,11 @@ It is known for users of \pTeX that there is a patch |jlisting.sty| for
905914 use the \emph{showexpl} package.
906915
907916 There is another way to use characters above 256 with the
908- \emph{listings} package (described in\cite{apl}), however,
917+ \emph{listings} package (described in\cite{apl}). However,
909918 this method is not suitable for Japanese, since the number of
910919 Japanese characters is very large. We hope that the
911- \emph{listings} package will be able to cope with all characters above
912- 256 in the future.
920+ \emph{listings} package will be able to handle all characters above
921+ 256 without any patch, in the future.
913922
914923
915924 \end{description}
@@ -917,10 +926,11 @@ There is another way to use characters above 256 with the
917926
918927
919928 \section{Implementation}
929+\label{sec:implementation}
920930 \subsection{Handling of Japanese fonts}
921931 In \pTeX, there are three slots for maintaining current fonts, namely
922-|\font| for alphabetic fonts, |\jfont| for Japanese font (in horizontal
923-direction) and |\tfont| for Japanese font (in vertical direction). With
932+|\font| for alphabetic fonts, |\jfont| for Japanese fonts (in horizontal
933+direction) and |\tfont| for Japanese fonts (in vertical direction). With
924934 these slots, we can manage the current font for alphabetic characters
925935 and that for Japanese characters separately in \pTeX. However, \LuaTeX\
926936 has only one slot for maintaining the current font, as \TeX82. This
@@ -947,7 +957,7 @@ they cannot be an argument of |\the|, |\fontname|, nor |\textfont|.
947957
948958 Callbacks by the \emph{luaotfload} package, e.g.,~replacement of glyphs
949959 according to font features, are executed just after `Examination of
950-Stack Level' (see Subsection~\ref{ssec-over}). Note that calculation of
960+Stack Level' (see Subsections \ref{ssec-over}~and~\ref{ssec-stack}). Note that calculation of
951961 character classes for each Japanese character is done \emph{after} the
952962 these callbacks for now.
953963
@@ -955,10 +965,10 @@ these callbacks for now.
955965 \label{ssec-stack}
956966
957967 As we noted in Subsection~\ref{ssec-csname}, parameters that the values
958-at the end of a horizontal box or that of a paragraph are effective in
968+at the end of a horizontal box or that of a paragraph are valid in
959969 whole box or paragraph, such as \emph{kanjiskip}, cannot be implemented
960970 by internal integers or registers of other types in \TeX. We explain it
961-in this section.
971+in this subsection.
962972
963973 \begin{figure}
964974 \begin{lstlisting}
@@ -1039,7 +1049,7 @@ needed. In the context of \pTeX, this process was performed using virtual fonts.
10391049 On the other hand, Lua\TeX-ja does the adjustment by encapsuling a glyph
10401050 into a horizontal box. There are two main reasons why we adopted this
10411051 method; one is that we feared Lua codes for coexisting with callbacks by
1042-|luaotfload| package would be large if we use virtual fonts, and the
1052+the |luaotfload| package would be large if we use virtual fonts, and the
10431053 other is to cope with shifting of the baseline of characters at the
10441054 same time.
10451055
@@ -1093,29 +1103,32 @@ same time.
10931103 \end{figure}
10941104
10951105 Figure~\ref{fig-pos} shows the adjustment process. A large square $M$ is
1096-the imaginary body which is specified in the metric, and a vertical
1106+the imaginary body specified in the metric, and a vertical
10971107 rectangle is the imaginary body of a real glyph. First, the real glyph
10981108 is aligned with respect to the width of $M$. In the figure, the real
10991109 glyph is aligned `middle'; this setting is useful for the full-width
1100-middle dot `・'. We have other settings, namely, `left' and `right'.
1110+middle dot `・'. We have other settings, `left' and `right'.
11011111 After that, it is shifted according to the value of |left| and |down|,
1102-which are specified in the metric. The final position of the real glyph
1112+which are specified in the metric, too. The final position of the real glyph
11031113 is shown by the gray rectangle~$R$. If the amount of shifting the baseline is
11041114 not zero, $M$ (and hence the real glyph) is shifted by that amount.
11051115
1106-We would like to remark briefly about the vertical position of a glyph.
1107-A JFM (or the metric used in \LuaTeX-ja) and the real font used for it
1108-may have different height or depth. In that case, it may look better if
1109-the real glyph is shifted vertically to match the height-depth ratio
1110-specified in the metric. This situation is carefully studied by
1116+We would like to remark briefly on the vertical position of a real
1117+glyph. A JFM (or a metric used in \LuaTeX-ja) and a real font used for
1118+it may have different height or depth. In that case, it may look better
1119+if the real glyph is shifted vertically to match the height-depth ratio
1120+specified in the metric, while any vertical adjustment except the
1121+adjustment by the |down| value does not performed in the present
1122+implementation of \LuaTeX-ja . This situation is carefully studied by
11111123 Otobe~\cite{min10}. Here the policy on this problem is not determined
1112-now, however we would like to offer several solutions in future development.
1124+now, however we would like to offer several solutions in future
1125+development.
11131126
11141127 \section{Conclusion}
11151128 We have discussed about our \LuaTeX-ja package, which is much affected
11161129 by \pTeX. For now, it can be used for experimental use, however there
11171130 are much refinements which are needed for regular use. The author hopes
1118-that this paper and this project contribute the typesetting Japanese,
1131+that this paper and \LuaTeX-ja project contribute the typesetting Japanese,
11191132 and possibly other Asian languages, under \LuaTeX.
11201133
11211134 \section*{Acknowledgements}
旧リポジトリブラウザで表示