[Pythonjp-checkins] [python-doc-ja] 3 new revisions pushed by anywa****@gmail***** on 2011-05-20 07:17 GMT

アーカイブの一覧に戻る

pytho****@googl***** pytho****@googl*****
2011年 5月 20日 (金) 16:17:33 JST


3 new revisions:

Revision: 9924099de4c5
Author:   Akihiro Uchida <uchid****@ike-d*****>
Date:     Wed May 18 13:35:26 2011
Log:      原文の更新を差分適用
http://code.google.com/p/python-doc-ja/source/detail?r=9924099de4c5

Revision: 67be0327a8fe
Author:   Akihiro Uchida <uchid****@ike-d*****>
Date:     Fri May 20 00:14:47 2011
Log:      translate howto/unicode.rst
http://code.google.com/p/python-doc-ja/source/detail?r=67be0327a8fe

Revision: e94aea17f93c
Author:   Akihiro Uchida <uchid****@ike-d*****>
Date:     Fri May 20 00:15:24 2011
Log:      merge
http://code.google.com/p/python-doc-ja/source/detail?r=e94aea17f93c

==============================================================================
Revision: 9924099de4c5
Author:   Akihiro Uchida <uchid****@ike-d*****>
Date:     Wed May 18 13:35:26 2011
Log:      原文の更新を差分適用
http://code.google.com/p/python-doc-ja/source/detail?r=9924099de4c5

Modified:
  /howto/unicode.rst

=======================================
--- /howto/unicode.rst	Sat Dec  4 02:43:38 2010
+++ /howto/unicode.rst	Wed May 18 13:35:26 2011
@@ -210,11 +210,12 @@
  to reading the Unicode character tables, available at
  <http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.

-Two other good introductory articles were written by Joel Spolsky
-<http://www.joelonsoftware.com/articles/Unicode.html> and Jason Orendorff
-<http://www.jorendorff.com/articles/unicode/>.  If this introduction  
didn't make
-things clear to you, you should try reading one of these alternate articles
-before continuing.
+Another good introductory article was written by Joel Spolsky
+<http://www.joelonsoftware.com/articles/Unicode.html>.
+If this introduction didn't make things clear to you, you should try  
reading this
+alternate article before continuing.
+
+.. Jason Orendorff XXX http://www.jorendorff.com/articles/unicode/ is  
broken

  Wikipedia entries are often helpful; see the entries for "character  
encoding"
  <http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
@@ -471,7 +472,7 @@
  from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
  "Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
  other".  See
-<http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values>  
for a
+<http://unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values> for  
a
  list of category codes.

  References

==============================================================================
Revision: 67be0327a8fe
Author:   Akihiro Uchida <uchid****@ike-d*****>
Date:     Fri May 20 00:14:47 2011
Log:      translate howto/unicode.rst
http://code.google.com/p/python-doc-ja/source/detail?r=67be0327a8fe

Modified:
  /howto/unicode.rst

=======================================
--- /howto/unicode.rst	Wed May 18 13:35:26 2011
+++ /howto/unicode.rst	Fri May 20 00:14:47 2011
@@ -1,93 +1,187 @@
-***********************
-  Unicode HOWTO (英語)
-***********************
+*****************
+  Unicode HOWTO
+*****************

  :Release: 1.02

-This HOWTO discusses Python's support for Unicode, and explains various  
problems
-that people commonly encounter when trying to work with Unicode.
-
-Introduction to Unicode
-=======================
-
-History of Character Codes
---------------------------
-
-In 1968, the American Standard Code for Information Interchange, better  
known by
-its acronym ASCII, was standardized.  ASCII defined numeric codes for  
various
-characters, with the numeric values running from 0 to
-127.  For example, the lowercase letter 'a' is assigned 97 as its code
-value.
-
-ASCII was an American-developed standard, so it only defined unaccented
-characters.  There was an 'e', but no 'é' or 'Í'.  This meant that  
languages
-which required accented characters couldn't be faithfully represented in  
ASCII.
-(Actually the missing accents matter for English, too, which contains  
words such
-as 'naïve' and 'café', and some publications have house styles which  
require
-spellings such as 'coöperate'.)
-
-For a while people just wrote programs that didn't display accents.  I  
remember
-looking at Apple ][ BASIC programs, published in French-language  
publications in
-the mid-1980s, that had lines like these::
+..
+  This HOWTO discusses Python's support for Unicode, and explains various  
problems
+  that people commonly encounter when trying to work with Unicode.
+
+この HOWTO 文書は Python の Unicode サポートについて論じ、
+さらに Unicode を使おうというときによくでくわす多くの問題について説明しま 
す。
+
+..
+  Introduction to Unicode
+  =======================
+
+Unicode 入門
+============
+
+..
+  History of Character Codes
+  --------------------------
+
+文字コードの歴史
+----------------
+
+..
+  In 1968, the American Standard Code for Information Interchange, better  
known by
+  its acronym ASCII, was standardized.  ASCII defined numeric codes for  
various
+  characters, with the numeric values running from 0 to
+  127.  For example, the lowercase letter 'a' is assigned 97 as its code
+  value.
+
+1968年に American Standard Code for Information Interchange が標準化されま 
した。
+頭文字の ASCII でよく知られています。
+ASCII は0から127までの、異なる文字の数値コードを定義しています。
+例えば、小文字の 'a' にはコード値 97 が割り当てられています。
+
+..
+  ASCII was an American-developed standard, so it only defined unaccented
+  characters.  There was an 'e', but no 'é' or 'Í'.  This meant that  
languages
+  which required accented characters couldn't be faithfully represented in  
ASCII.
+  (Actually the missing accents matter for English, too, which contains  
words such
+  as 'naïve' and 'café', and some publications have house styles which  
require
+  spellings such as 'coöperate'.)
+
+ASCII はアメリカの開発標準だったのでアクセント無しの文字のみを定義していま 
した。
+'e' はありましたが、'é' や 'Í' はありませんでした。
+つまり、アクセント付きの文字を必要とする言語は ASCII できちんと表現するとが 
できません。
+(実際には英語でもアクセント無しという問題はありました、
+'naïve' や 'café' のようなアクセントを含む単語や、
+いくつかの出版社は 'coöperate' のような独自のスタイルのつづりを必要とするな 
ど)
+
+..
+  For a while people just wrote programs that didn't display accents.  I  
remember
+  looking at Apple ][ BASIC programs, published in French-language  
publications in
+  the mid-1980s, that had lines like these::
+
+しばらくの間は単のアクセントが表示されないプログラムを書きました。
+1980年半ばのフランス語で出版された Apple ][ の BASIC プログラムを見た記憶を 
辿ると、
+そこにはこんな行がありました::

  	PRINT "FICHER EST COMPLETE."
  	PRINT "CARACTERE NON ACCEPTE."

-Those messages should contain accents, and they just look wrong to someone  
who
-can read French.
-
-In the 1980s, almost all personal computers were 8-bit, meaning that bytes  
could
-hold values ranging from 0 to 255.  ASCII codes only went up to 127, so  
some
-machines assigned values between 128 and 255 to accented characters.   
Different
-machines had different codes, however, which led to problems exchanging  
files.
-Eventually various commonly used sets of values for the 128-255 range  
emerged.
-Some were true standards, defined by the International Standards  
Organization,
-and some were **de facto** conventions that were invented by one company or
-another and managed to catch on.
-
-255 characters aren't very many.  For example, you can't fit both the  
accented
-characters used in Western Europe and the Cyrillic alphabet used for  
Russian
-into the 128-255 range because there are more than 127 such characters.
-
-You could write files using different codes (all your Russian files in a  
coding
-system called KOI8, all your French files in a different coding system  
called
-Latin1), but what if you wanted to write a French document that quotes some
-Russian text?  In the 1980s people began to want to solve this problem,  
and the
-Unicode standardization effort began.
-
-Unicode started out using 16-bit characters instead of 8-bit characters.   
16
-bits means you have 2^16 = 65,536 distinct values available, making it  
possible
-to represent many different characters from many different alphabets; an  
initial
-goal was to have Unicode contain the alphabets for every single human  
language.
-It turns out that even 16 bits isn't enough to meet that goal, and the  
modern
-Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff in
-base-16).
-
-There's a related ISO standard, ISO 10646.  Unicode and ISO 10646 were
-originally separate efforts, but the specifications were merged with the  
1.1
-revision of Unicode.
-
-(This discussion of Unicode's history is highly simplified.  I don't think  
the
-average Python programmer needs to worry about the historical details;  
consult
-the Unicode consortium site listed in the References for more information.)
-
-
-Definitions
------------
-
-A **character** is the smallest possible component of a  
text.  'A', 'B', 'C',
-etc., are all different characters.  So are 'È' and 'Í'.  Characters are
-abstractions, and vary depending on the language or context you're talking
-about.  For example, the symbol for ohms (Ω) is usually drawn much like the
-capital letter omega (Ω) in the Greek alphabet (they may even be the same  
in
-some fonts), but these are two different characters that have different
-meanings.
-
-The Unicode standard describes how characters are represented by **code
-points**.  A code point is an integer value, usually denoted in base 16.   
In the
-standard, a code point is written using the notation U+12ca to mean the
-character with value 0x12ca (4810 decimal).  The Unicode standard contains  
a lot
-of tables listing characters and their corresponding code points::
+..
+  Those messages should contain accents, and they just look wrong to  
someone who
+  can read French.
+
+これらのメッセージはアクセントを含むべきで、
+フランス語を読める人から見ると単に間違いとみなされます。
+
+..
+  In the 1980s, almost all personal computers were 8-bit, meaning that  
bytes could
+  hold values ranging from 0 to 255.  ASCII codes only went up to 127, so  
some
+  machines assigned values between 128 and 255 to accented characters.   
Different
+  machines had different codes, however, which led to problems exchanging  
files.
+  Eventually various commonly used sets of values for the 128-255 range  
emerged.
+  Some were true standards, defined by the International Standards  
Organization,
+  and some were **de facto** conventions that were invented by one company  
or
+  another and managed to catch on.
+
+1980年代には、多くのパーソナルコンピューターは 8-bit でした、
+つまり 8-bit で 0-255 までの値を確保することができました。
+ASCII コードは 127 までだったので、いくつかのマシンは 128 から 255 の値を
+アクセント付きの文字に割り当てました。
+マシン毎に異なる文字コードを持っていました、
+しかし、そのせいでファイル交換の問題が起きました。
+結局、128-255 の間の値はよく使われる集合がたくさん現われることになりまし 
た。
+そのうちいくつかは International Standards Organzation の定める本当の標準に 
なり、
+またいくつかは一社で開発され、別の会社へと流行することで **事実上の** 慣習 
となりました。
+
+..
+  255 characters aren't very many.  For example, you can't fit both the  
accented
+  characters used in Western Europe and the Cyrillic alphabet used for  
Russian
+  into the 128-255 range because there are more than 127 such characters.
+
+255文字というのは十分多い数ではありません。
+例えば、西ヨーロッパで使われるアクセント付き文字とロシアで使われるキリルア 
ルファベットの両方は
+127文字以上あるので、128-255の間におさめることはできません。
+
+..
+  You could write files using different codes (all your Russian files in a  
coding
+  system called KOI8, all your French files in a different coding system  
called
+  Latin1), but what if you wanted to write a French document that quotes  
some
+  Russian text?  In the 1980s people began to want to solve this problem,  
and the
+  Unicode standardization effort began.
+
+異なる文字コードを使ってファイルを作成することは可能です
+(持っているロシア語のファイル全てを KOI8 と呼ばれるコーディングシステムで、
+持っているフランス語のファイル全てを別の Latin1 と呼ばれるコーディングシス 
テムにすることで)、
+しかし、ロシア語の文章を引用するフランス語の文章を書きたい場合にはどうでし 
ょう?
+1989年代にこの問題を解決したいという要望が上って、Unicode 標準化の努力が始 
まりました。
+
+..
+  Unicode started out using 16-bit characters instead of 8-bit  
characters.  16
+  bits means you have 2^16 = 65,536 distinct values available, making it  
possible
+  to represent many different characters from many different alphabets; an  
initial
+  goal was to have Unicode contain the alphabets for every single human  
language.
+  It turns out that even 16 bits isn't enough to meet that goal, and the  
modern
+  Unicode specification uses a wider range of codes, 0-1,114,111 (0x10ffff  
in
+  base-16).
+
+Unicode は 8-bit の文字の代わりに 16-bit の文字を使うことにとりかかりまし 
た。
+16bit 使うということは 2^16 = 65,536 の異なる値が利用可能だということを意味 
します、
+これによって多くの異なるアルファベット上の多くの異なる文字を表現することが 
できます;
+最初の目標は Unicode が人間が使う個々の言語のアルファベットを含むことでし 
た。
+あとになってこの目標を達成するには 16bit でさえも不十分であることがわかりま 
した、
+そして最新の Unicode 規格は 0-1,114,111 (16進表記で 0x10ffff) までの
+より広い文字コードの幅を使っています。
+
+..
+  There's a related ISO standard, ISO 10646.  Unicode and ISO 10646 were
+  originally separate efforts, but the specifications were merged with the  
1.1
+  revision of Unicode.
+
+関連する ISO 標準も ISO 10646 があります。Unicode と ISO 10646 は元々独立し 
た成果でしたが、
+Unicode の 1.1 リビジョンで仕様を併合しました。
+
+..
+  (This discussion of Unicode's history is highly simplified.  I don't  
think the
+  average Python programmer needs to worry about the historical details;  
consult
+  the Unicode consortium site listed in the References for more  
information.)
+
+(この Unicode の歴史についての解説は非常に単純化しています。
+平均的な Python プログラマは歴史的な詳細を気にする必要は無いと考えています;
+より詳しい情報は参考文献に載せた Unicode コンソーシアムのサイトを参考にして 
下さい。)
+
+..
+  Definitions
+  -----------
+
+定義
+----
+
+..
+  A **character** is the smallest possible component of a  
text.  'A', 'B', 'C',
+  etc., are all different characters.  So are 'È' and 'Í'.  Characters are
+  abstractions, and vary depending on the language or context you're  
talking
+  about.  For example, the symbol for ohms (Ω) is usually drawn much like  
the
+  capital letter omega (Ω) in the Greek alphabet (they may even be the  
same in
+  some fonts), but these are two different characters that have different
+  meanings.
+
+**文字** は文章の構成要素の中の最小のものです。'A', 'B', 'C' などは全て異な 
る文字です。
+'È' や 'Í' も同様に異なる文字です。
+文字は抽象的な概念で、言語や文脈に依存してさまざまに変化します。
+例えば、オーム(Ω) はふつう大文字ギリシャ文字のオメガ (Ω) で書かれますが
+(これらはいくつかのフォントで全く同じ書体かもしれません)
+しかし、これらは異なる意味を持つ異なる文字とみなされます。
+
+..
+  The Unicode standard describes how characters are represented by **code
+  points**.  A code point is an integer value, usually denoted in base  
16.  In the
+  standard, a code point is written using the notation U+12ca to mean the
+  character with value 0x12ca (4810 decimal).  The Unicode standard  
contains a lot
+  of tables listing characters and their corresponding code points::
+
+Unicode 標準は文字が **コードポイント (code points)** でどう表現するかを記 
述しています。
+コードポイントは整数値で、ふつう16進表記で書かれます。
+標準的にはコードポイントは U+12ca のような表記を使って書かれます、
+U+12ca は 0x12ca (10進表記で 4810) を意味しています。
+Unicode 標準は文字とコードポイントを対応させる多くのテーブルを含んでいま 
す::

  	0061    'a'; LATIN SMALL LETTER A
  	0062    'b'; LATIN SMALL LETTER B
@@ -95,155 +189,299 @@
          ...
  	007B	'{'; LEFT CURLY BRACKET

-Strictly, these definitions imply that it's meaningless to say 'this is
-character U+12ca'.  U+12ca is a code point, which represents some  
particular
-character; in this case, it represents the character 'ETHIOPIC SYLLABLE  
WI'.  In
-informal contexts, this distinction between code points and characters will
-sometimes be forgotten.
-
-A character is represented on a screen or on paper by a set of graphical
-elements that's called a **glyph**.  The glyph for an uppercase A, for  
example,
-is two diagonal strokes and a horizontal stroke, though the exact details  
will
-depend on the font being used.  Most Python code doesn't need to worry  
about
-glyphs; figuring out the correct glyph to display is generally the job of  
a GUI
-toolkit or a terminal's font renderer.
-
-
-Encodings
----------
-
-To summarize the previous section: a Unicode string is a sequence of code
-points, which are numbers from 0 to 0x10ffff.  This sequence needs to be
-represented as a set of bytes (meaning, values from 0-255) in memory.  The  
rules
-for translating a Unicode string into a sequence of bytes are called an
-**encoding**.
-
-The first encoding you might think of is an array of 32-bit integers.  In  
this
-representation, the string "Python" would look like this::
+..
+  Strictly, these definitions imply that it's meaningless to say 'this is
+  character U+12ca'.  U+12ca is a code point, which represents some  
particular
+  character; in this case, it represents the character 'ETHIOPIC SYLLABLE  
WI'.  In
+  informal contexts, this distinction between code points and characters  
will
+  sometimes be forgotten.
+
+厳密にいうとこれらの定義は「この文字は U+12ca です」ということを意味してい 
ません。
+U+12ca はコードポイントで、特定の文字を示しています; この場合で 
は、'ETHIOPIC SYLLABLE WI' を示しています。
+細かく気にしない文脈の中ではコードポイントと文字の区別は忘れられることがよ 
くあります。
+
+..
+  A character is represented on a screen or on paper by a set of graphical
+  elements that's called a **glyph**.  The glyph for an uppercase A, for  
example,
+  is two diagonal strokes and a horizontal stroke, though the exact  
details will
+  depend on the font being used.  Most Python code doesn't need to worry  
about
+  glyphs; figuring out the correct glyph to display is generally the job  
of a GUI
+  toolkit or a terminal's font renderer.
+
+文字は画面や紙面上では **グリフ (glyph)** と呼ばれるグラフィック要素の組で 
表示されます。
+大文字の A のグリフは例えば、厳密な形は使っているフォントによって異なります 
が、斜めの線と水平の線です。
+たいていの Python コードではグリフの心配をする必要はありません;
+一般的には表示する正しいグリフを見付けることは GUI toolkit や端末のフォント 
レンダラーの仕事です。
+
+..
+  Encodings
+  ---------
+
+エンコーディング
+----------------
+
+..
+  To summarize the previous section: a Unicode string is a sequence of code
+  points, which are numbers from 0 to 0x10ffff.  This sequence needs to be
+  represented as a set of bytes (meaning, values from 0-255) in memory.   
The rules
+  for translating a Unicode string into a sequence of bytes are called an
+  **encoding**.
+
+前の節をまとめると: Unicode 文字列は 0 から 0x10ffff までの数値であるコード 
ポイントのシーケンスで、
+シーケンスはメモリ上でバイト (0 から 255 までの値) の組として表現される必要 
があります。
+バイト列を Unicode 文字列に変換する規則を **エンコーディング (encoding)**  
と呼びます。
+
+..
+  The first encoding you might think of is an array of 32-bit integers.   
In this
+  representation, the string "Python" would look like this::
+
+最初に考えるであろうエンコーディングは 32-bit 整数の配列でしょう。
+この表示では、"Python" という文字列はこうみえます::

         P           y           t           h           o           n
      0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00  
00
         0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22  
23

-This representation is straightforward but using it presents a number of
-problems.
-
-1. It's not portable; different processors order the bytes differently.
-
-2. It's very wasteful of space.  In most texts, the majority of the code  
points
-   are less than 127, or less than 255, so a lot of space is occupied by  
zero
-   bytes.  The above string takes 24 bytes compared to the 6 bytes needed  
for an
-   ASCII representation.  Increased RAM usage doesn't matter too much  
(desktop
-   computers have megabytes of RAM, and strings aren't usually that  
large), but
-   expanding our usage of disk and network bandwidth by a factor of 4 is
-   intolerable.
-
-3. It's not compatible with existing C functions such as ``strlen()``, so  
a new
-   family of wide string functions would need to be used.
-
-4. Many Internet standards are defined in terms of textual data, and can't
-   handle content with embedded zero bytes.
-
-Generally people don't use this encoding, instead choosing other encodings  
that
-are more efficient and convenient.
-
-Encodings don't have to handle every possible Unicode character, and most
-encodings don't.  For example, Python's default encoding is the 'ascii'
-encoding.  The rules for converting a Unicode string into the ASCII  
encoding are
-simple; for each code point:
-
-1. If the code point is < 128, each byte is the same as the value of the  
code
-   point.
-
-2. If the code point is 128 or greater, the Unicode string can't be  
represented
-   in this encoding.  (Python raises a :exc:`UnicodeEncodeError` exception  
in this
-   case.)
-
-Latin-1, also known as ISO-8859-1, is a similar encoding.  Unicode code  
points
-0-255 are identical to the Latin-1 values, so converting to this encoding  
simply
-requires converting code points to byte values; if a code point larger  
than 255
-is encountered, the string can't be encoded into Latin-1.
-
-Encodings don't have to be simple one-to-one mappings like Latin-1.   
Consider
-IBM's EBCDIC, which was used on IBM mainframes.  Letter values weren't in  
one
-block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r'  
were 145
-through 153.  If you wanted to use EBCDIC as an encoding, you'd probably  
use
-some sort of lookup table to perform the conversion, but this is largely an
-internal detail.
-
-UTF-8 is one of the most commonly used encodings.  UTF stands for "Unicode
-Transformation Format", and the '8' means that 8-bit numbers are used in  
the
-encoding.  (There's also a UTF-16 encoding, but it's less frequently used  
than
-UTF-8.)  UTF-8 uses the following rules:
-
-1. If the code point is <128, it's represented by the corresponding byte  
value.
-2. If the code point is between 128 and 0x7ff, it's turned into two byte  
values
-   between 128 and 255.
-3. Code points >0x7ff are turned into three- or four-byte sequences, where  
each
-   byte of the sequence is between 128 and 255.
-
-UTF-8 has several convenient properties:
-
-1. It can handle any Unicode code point.
-2. A Unicode string is turned into a string of bytes containing no  
embedded zero
-   bytes.  This avoids byte-ordering issues, and means UTF-8 strings can be
-   processed by C functions such as ``strcpy()`` and sent through  
protocols that
-   can't handle zero bytes.
-3. A string of ASCII text is also valid UTF-8 text.
-4. UTF-8 is fairly compact; the majority of code points are turned into two
-   bytes, and values less than 128 occupy only a single byte.
-5. If bytes are corrupted or lost, it's possible to determine the start of  
the
-   next UTF-8-encoded code point and resynchronize.  It's also unlikely  
that
-   random 8-bit data will look like valid UTF-8.
-
-
-
-References
-----------
-
-The Unicode Consortium site at <http://www.unicode.org> has character  
charts, a
-glossary, and PDF versions of the Unicode specification.  Be prepared for  
some
-difficult reading.  <http://www.unicode.org/history/> is a chronology of  
the
-origin and development of Unicode.
-
-To help understand the standard, Jukka Korpela has written an introductory  
guide
-to reading the Unicode character tables, available at
-<http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
-
-Another good introductory article was written by Joel Spolsky
-<http://www.joelonsoftware.com/articles/Unicode.html>.
-If this introduction didn't make things clear to you, you should try  
reading this
-alternate article before continuing.
+..
+  This representation is straightforward but using it presents a number of
+  problems.
+
+この表示は直接的でわかりやすい方法ですが、この表示を使うにはいくつかの問題 
があります。
+
+..
+  1. It's not portable; different processors order the bytes differently.
+
+  2. It's very wasteful of space.  In most texts, the majority of the code  
points
+     are less than 127, or less than 255, so a lot of space is occupied by  
zero
+     bytes.  The above string takes 24 bytes compared to the 6 bytes  
needed for an
+     ASCII representation.  Increased RAM usage doesn't matter too much  
(desktop
+     computers have megabytes of RAM, and strings aren't usually that  
large), but
+     expanding our usage of disk and network bandwidth by a factor of 4 is
+     intolerable.
+
+  3. It's not compatible with existing C functions such as ``strlen()``,  
so a new
+     family of wide string functions would need to be used.
+
+  4. Many Internet standards are defined in terms of textual data, and  
can't
+     handle content with embedded zero bytes.
+
+1. 可搬性がない; プロセッサが異なるとバイトの順序づけも変わってしまいます。
+
+2. 空間を無駄に使ってしまいます。
+   多くの文書では、コードポイントの多くは 127 か 255 より小さいため多くの空 
間が
+   ゼロバイトに占有されます。
+   上の文字列はASCII表示では6バイトを必要だったのに対して24バイトを必要とし 
ています。
+   RAM の使用料の増加はたいした問題ではありませんが
+   (デスクトップコンピュータは RAM をメガバイト単位で持っていますし、
+   文字列はそこまで大きい容量にはなりません)、
+   しかし、ディスクとネットワークの帯域が4倍増えることはとても我慢できるも 
のではありません。
+
+3. ``strlen()`` のような現存する C 関数と互換性がありません、
+   そのためワイド文字列関数一式が新たに必要となります。
+
+4. 多くのインターネット標準がテキストデータとして定義されていて、
+   それらはゼロバイトの埋め込まれた内容を扱うことができません。
+
+..
+  generally people don't use this encoding, instead choosing other  
encodings that
+  are more efficient and convenient.
+
+一般的にこのエンコーディングは使わず、変わりにより効率的で便利な他のエン 
コーディングが選ばれています。
+
+..
+  Encodings don't have to handle every possible Unicode character, and most
+  encodings don't.  For example, Python's default encoding is the 'ascii'
+  encoding.  The rules for converting a Unicode string into the ASCII  
encoding are
+  simple; for each code point:
+
+エンコーディングは全ての Unicode 文字を扱う必要はありませんし、多くのエン 
コーディングはそれをしません。
+例えば Python のデフォルトエンコーディングの 'ascii' エンコーディング。
+Unicode 文字列を ASCII エンコーディングに変換する規則は単純です; それぞれの 
コードポイントに対して:
+
+..
+  1. If the code point is < 128, each byte is the same as the value of the  
code
+     point.
+
+  2. If the code point is 128 or greater, the Unicode string can't be  
represented
+     in this encoding.  (Python raises a :exc:`UnicodeEncodeError`  
exception in this
+     case.)
+
+1. コードポイントは128より小さい場合、コードポイントと同じ値
+
+2. コードポイントが128以上の場合、Unicode 文字列はエンコーディングで表示す 
ることができません。
+   (この場合 Python は :exc:`UnicodeEncodeError` 例外を送出します。)
+
+..
+  Latin-1, also known as ISO-8859-1, is a similar encoding.  Unicode code  
points
+  0-255 are identical to the Latin-1 values, so converting to this  
encoding simply
+  requires converting code points to byte values; if a code point larger  
than 255
+  is encountered, the string can't be encoded into Latin-1.
+
+Latin-1, ISO-8859-1 として知られるエンコーディングも同様のエンコーディング 
です。
+Unicode コードポイントの 0-255 は Latin-1 の値と等価なので、このエンコーデ 
ィングの変換するには、
+単純にコードポイントをバイト値に変換する必要があります;
+もしコードポイントが255より大きい場合に遭遇した場合、文字列は Latin-1 にエ 
ンコードできません。
+
+..
+  Encodings don't have to be simple one-to-one mappings like Latin-1.   
Consider
+  IBM's EBCDIC, which was used on IBM mainframes.  Letter values weren't  
in one
+  block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r'  
were 145
+  through 153.  If you wanted to use EBCDIC as an encoding, you'd probably  
use
+  some sort of lookup table to perform the conversion, but this is largely  
an
+  internal detail.
+
+エンコーディングは Latin-1 のように単純な一対一対応を持っていません。
+IBM メインフレームで使われていた IBM の EBCDIC で考えてみます。
+文字は一つのブロックに収められていませんでした: 'a' から 'i' は 129 から  
137 まででしたが、
+'j' から 'r' までは 145 から 153 までした。
+EBICIC を使いたいと思ったら、おそらく変換を実行するルックアップテーブルの類 
を使う必要があるでしょう、
+これは内部の詳細のことになりますが。
+
+..
+  UTF-8 is one of the most commonly used encodings.  UTF stands  
for "Unicode
+  Transformation Format", and the '8' means that 8-bit numbers are used in  
the
+  encoding.  (There's also a UTF-16 encoding, but it's less frequently  
used than
+  UTF-8.)  UTF-8 uses the following rules:
+
+UTF-8 は最もよく使われるエンコーディングの一つです.
+UTF は "Unicode Transformation Format" からとられていて、
+8 はエンコーディングに 8-bit の数字を使うことを意味しています。
+(同じく UTF-16 エンコーディングもあります、しかしこちらは UTF-8 ほど頻繁に 
使われていません。)
+UTF-8 は以下の規則を利用します:
+
+..
+  1. If the code point is <128, it's represented by the corresponding byte  
value.
+  2. If the code point is between 128 and 0x7ff, it's turned into two byte  
values
+     between 128 and 255.
+  3. Code points >0x7ff are turned into three- or four-byte sequences,  
where each
+     byte of the sequence is between 128 and 255.
+
+1. コードポイントが128より小さい場合、対応するバイト値で表現。
+2. コードポイントは128から0x7ff の間の場合、128から255までの2バイト値に変 
換。
+3. 0x7ff より大きいコードポイントは3か4バイト列に変換し、バイト列のそれぞれ 
のバイトは128から255の間をとる。
+
+..
+  UTF-8 has several convenient properties:
+
+UTF-8 はいくつかの便利な性質を持っています。
+
+..
+  1. It can handle any Unicode code point.
+  2. A Unicode string is turned into a string of bytes containing no  
embedded zero
+     bytes.  This avoids byte-ordering issues, and means UTF-8 strings can  
be
+     processed by C functions such as ``strcpy()`` and sent through  
protocols that
+     can't handle zero bytes.
+  3. A string of ASCII text is also valid UTF-8 text.
+  4. UTF-8 is fairly compact; the majority of code points are turned into  
two
+     bytes, and values less than 128 occupy only a single byte.
+  5. If bytes are corrupted or lost, it's possible to determine the start  
of the
+     next UTF-8-encoded code point and resynchronize.  It's also unlikely  
that
+     random 8-bit data will look like valid UTF-8.
+
+1. 任意の Unicode コードポイントを扱うことができる。
+2. Unicode 文字列をゼロバイトで埋めないバイト文字列に変換する。
+   これによってバイト順の問題を解決し、UTF-8 文字列を ``strcpy()`` のよう 
な C 関数で処理することができ、
+   そしてゼロバイトを扱うことができないプロトコル経由で送信することができま 
す。
+3. ASCII テキストの文字列は UTF-8 テキストとしても有効です。
+4. UTF-8 はかなりコンパクトです; コードポイントの多くは2バイトに変換され、
+   値が128より小さければ、1バイトしか占有しません。
+5. バイトが欠落したり、失われた場合、次の UTF-8 でエンコードされたコードポ 
イントの開始を決定し、
+   再同期することができる可能性があります。
+   同様の理由でランダムな 8-bit データは正当な UTF-8 とみなされにくくなって 
います。
+
+..
+  References
+  ----------
+
+参考文献
+--------
+
+..
+  The Unicode Consortium site at <http://www.unicode.org> has character  
charts, a
+  glossary, and PDF versions of the Unicode specification.  Be prepared  
for some
+  difficult reading.  <http://www.unicode.org/history/> is a chronology of  
the
+  origin and development of Unicode.
+
+Unicode コンソーシアムのサイト <http://www.unicode.org> には文字の図表や用 
語辞典、そして Unicode 仕様の PDF があります。
+読むのは簡単ではないので覚悟して下さい。
+
+<http://www.unicode.org/history/> は Unicode の起源と発展の年表です。
+
+..
+  To help understand the standard, Jukka Korpela has written an  
introductory guide
+  to reading the Unicode character tables, available at
+  <http://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
+
+標準についての理解を助けるために Jukka Korpela が Unicode の文字表を読むた 
めの導入ガイドを書いています、
+<http://www.cs.tut.fi/~jkorpela/unicode/guide.html> から入手可能です。
+
+..
+  Another good introductory article was written by Joel Spolsky
+  <http://www.joelonsoftware.com/articles/Unicode.html>.
+  If this introduction didn't make things clear to you, you should try  
reading this
+  alternate article before continuing.
+
+もう一つのよい入門記事  
<http://www.joelonsoftware.com/articles/Unicode.html> を
+Joel Spolsky が書いています。
+もしこの HOWTO の入門が明解に感じなかった場合には、続きを読む前にこの記事を 
読んでみるべきです。

  .. Jason Orendorff XXX http://www.jorendorff.com/articles/unicode/ is  
broken

-Wikipedia entries are often helpful; see the entries for "character  
encoding"
-<http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
-<http://en.wikipedia.org/wiki/UTF-8>, for example.
-
-
-Python's Unicode Support
-========================
-
-Now that you've learned the rudiments of Unicode, we can look at Python's
-Unicode features.
-
-
-The Unicode Type
-----------------
-
-Unicode strings are expressed as instances of the :class:`unicode` type,  
one of
-Python's repertoire of built-in types.  It derives from an abstract type  
called
-:class:`basestring`, which is also an ancestor of the :class:`str` type;  
you can
-therefore check if a value is a string type with ``isinstance(value,
-basestring)``.  Under the hood, Python represents Unicode strings as  
either 16-
-or 32-bit integers, depending on how the Python interpreter was compiled.
-
-The :func:`unicode` constructor has the signature ``unicode(string[,  
encoding,
-errors])``.  All of its arguments should be 8-bit strings.  The first  
argument
-is converted to Unicode using the specified encoding; if you leave off the
-``encoding`` argument, the ASCII encoding is used for the conversion, so
-characters greater than 127 will be treated as errors::
+..
+  Wikipedia entries are often helpful; see the entries for "character  
encoding"
+  <http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
+  <http://en.wikipedia.org/wiki/UTF-8>, for example.
+
+Wikipedia の記事はしばしば役に立ちます; 試しに "character encoding"
+<http://en.wikipedia.org/wiki/Character_encoding> の記事と
+UTF-8 <http://en.wikipedia.org/wiki/UTF-8> の記事を読んでみて下さい。
+
+..
+  Python's Unicode Support
+  ========================
+
+Python の Unicode サポート
+==========================
+
+..
+  Now that you've learned the rudiments of Unicode, we can look at Python's
+  Unicode features.
+
+ここまでで Unicode の基礎を学びました、ここから Python の Unicode 機能に触 
れます。
+
+..
+  The Unicode Type
+  ----------------
+
+Unicode 型
+----------
+
+..
+  Unicode strings are expressed as instances of the :class:`unicode` type,  
one of
+  Python's repertoire of built-in types.  It derives from an abstract type  
called
+  :class:`basestring`, which is also an ancestor of the :class:`str` type;  
you can
+  therefore check if a value is a string type with ``isinstance(value,
+  basestring)``.  Under the hood, Python represents Unicode strings as  
either 16-
+  or 32-bit integers, depending on how the Python interpreter was compiled.
+
+Unicode 文字列は Python の組み込み型の一つ :class:`unicode` 型のインスタン 
スとして表現されます。
+:class:`basestring` と呼ばれる抽象クラスから派生しています、 :class:`str`  
型の親戚でもあります;
+そのため ``isinstance(value, basestring)`` で文字列型かどうか調べることがで 
きます。
+Python 内部では Unicode 文字列は16-bit, 32-bit 整数のどちらかで表現され、
+どちらが使われるかは Python インタプリタがどうコンパイルされたかに依存しま 
す。
+
+..
+  The :func:`unicode` constructor has the signature ``unicode(string[,  
encoding,
+  errors])``.  All of its arguments should be 8-bit strings.  The first  
argument
+  is converted to Unicode using the specified encoding; if you leave off  
the
+  ``encoding`` argument, the ASCII encoding is used for the conversion, so
+  characters greater than 127 will be treated as errors::
+
+:func:`unicode` コンストラクタは ``unicode(string[, encoding, errors])`` と 
いう用法を持っています。
+この引数は全て 8-bit 文字列でなければいけません。
+最初の引数は指定したエンコーディングを使って Unicode に変換されます;
+``encoding`` 引数を渡さない場合、変換には ASCII エンコーディングが使われま 
す、
+そのため 127 より大きい文字はエラーとして扱われます::

      >>> unicode('abcdef')
      u'abcdef'
@@ -256,11 +494,18 @@
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
                          ordinal not in range(128)

-The ``errors`` argument specifies the response when the input string can't  
be
-converted according to the encoding's rules.  Legal values for this  
argument are
-'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
-'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
-Unicode result).  The following examples show the differences::
+..
+  The ``errors`` argument specifies the response when the input string  
can't be
+  converted according to the encoding's rules.  Legal values for this  
argument are
+  'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add  
U+FFFD,
+  'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of  
the
+  Unicode result).  The following examples show the differences::
+
+``errors`` 引数は入力文字列がエンコーディング規則に従って変換できないときの 
対応を指定します。
+この引数に有効な値は 'strict' (``UnicodeDecodeError`` を送出する)、
+'replace' (U+FFFD, 'REPLACEMENT CHARACTER' を追加する)、
+または 'ignore' (結果の Unicode 文字列から文字を除くだけ) です。
+以下の例で違いを示します::

      >>> unicode('\x80abc', errors='strict')
      Traceback (most recent call last):
@@ -272,25 +517,41 @@
      >>> unicode('\x80abc', errors='ignore')
      u'abc'

-Encodings are specified as strings containing the encoding's name.  Python  
2.4
-comes with roughly 100 different encodings; see the Python Library  
Reference at
-:ref:`standard-encodings` for a list.  Some encodings
-have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are  
all
-synonyms for the same encoding.
-
-One-character Unicode strings can also be created with the :func:`unichr`
-built-in function, which takes integers and returns a Unicode string of  
length 1
-that contains the corresponding code point.  The reverse operation is the
-built-in :func:`ord` function that takes a one-character Unicode string and
-returns the code point value::
+..
+  Encodings are specified as strings containing the encoding's name.   
Python 2.4
+  comes with roughly 100 different encodings; see the Python Library  
Reference at
+  :ref:`standard-encodings` for a list.  Some encodings
+  have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are  
all
+  synonyms for the same encoding.
+
+エンコーディングはエンコーディング名を含む文字列によって指定されます。
+Python 2.4 ではエンコーディングはおよそ100に及びます;
+一覧は Python ライブラリレファレンスの :ref:`standard-encodings` を参照して 
下さい。
+いくつかのエンコーディングは複数の名前を持っています; 例え 
ば 'latin-1', 'iso_8859_1',
+そして '8859' これらは全て同じエンコーディングの別称です。
+
+..
+  One-character Unicode strings can also be created with the :func:`unichr`
+  built-in function, which takes integers and returns a Unicode string of  
length 1
+  that contains the corresponding code point.  The reverse operation is the
+  built-in :func:`ord` function that takes a one-character Unicode string  
and
+  returns the code point value::
+
+Unicode 文字列の一つの文字は :func:`unichr` 組み込み関数で作成することがで 
きます、
+この関数は整数を引数にとり、対応するコードポイントを含む長さ1の Unicode 文 
字列を返します。
+逆の操作は :func:`ord` 組み込み関数です、この関数は一文字の Unicode 文字列 
を引数にとり、
+コードポイント値を返します::

      >>> unichr(40960)
      u'\ua000'
      >>> ord(u'\ua000')
      40960

-Instances of the :class:`unicode` type have many of the same methods as the
-8-bit string type for operations such as searching and formatting::
+..
+  Instances of the :class:`unicode` type have many of the same methods as  
the
+  8-bit string type for operations such as searching and formatting::
+
+:class:`unicode` 型のインスタンスは多くの 8-bit 文字列型と同じ検索や書式指 
定のためのメソッドを持っています::

      >>> s = u'Was ever feather so lightly blown to and fro as this  
multitude?'
      >>> s.count('e')
@@ -304,10 +565,15 @@
      >>> s.upper()
      u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'

-Note that the arguments to these methods can be Unicode strings or 8-bit
-strings.  8-bit strings will be converted to Unicode before carrying out  
the
-operation; Python's default ASCII encoding will be used, so characters  
greater
-than 127 will cause an exception::
+..
+  Note that the arguments to these methods can be Unicode strings or 8-bit
+  strings.  8-bit strings will be converted to Unicode before carrying out  
the
+  operation; Python's default ASCII encoding will be used, so characters  
greater
+  than 127 will cause an exception::
+
+これらのメソッドの引数は Unicode 文字列または 8-bit 文字列が使えることに注 
意して下さい。
+8-bit 文字列は操作に使われる前に Unicode に変換されます;
+Python デフォルトの ASCII エンコーディングが利用されるため、127より大きい文 
字列は例外を引き起します::

      >>> s.find('Was\x9f')
      Traceback (most recent call last):
@@ -316,16 +582,27 @@
      >>> s.find(u'Was\x9f')
      -1

-Much Python code that operates on strings will therefore work with Unicode
-strings without requiring any changes to the code.  (Input and output code  
needs
-more updating for Unicode; more on this later.)
-
-Another important method is ``.encode([encoding], [errors='strict'])``,  
which
-returns an 8-bit string version of the Unicode string, encoded in the  
requested
-encoding.  The ``errors`` parameter is the same as the parameter of the
-``unicode()`` constructor, with one additional possibility; as well  
as 'strict',
-'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses  
XML's
-character references.  The following example shows the different results::
+..
+  Much Python code that operates on strings will therefore work with  
Unicode
+  strings without requiring any changes to the code.  (Input and output  
code needs
+  more updating for Unicode; more on this later.)
+
+文字列操作を行なう多くの Python コードはコードの変更無しに Unicode 文字列を 
扱うことができるでしょう。
+(入出力に関しては Unicode のための更新が必要になります; 詳しくは後で述べま 
す。)
+
+..
+  Another important method is ``.encode([encoding], [errors='strict'])``,  
which
+  returns an 8-bit string version of the Unicode string, encoded in the  
requested
+  encoding.  The ``errors`` parameter is the same as the parameter of the
+  ``unicode()`` constructor, with one additional possibility; as well  
as 'strict',
+  'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which  
uses XML's
+  character references.  The following example shows the different  
results::
+
+別の重要なメソッドは ``.encode([encoding], [errors='strict'])`` がありま 
す、
+このメソッドは Unicode 文字列を要求したエンコーディングでエンコードされた  
8-bit 文字列を返します。
+``errors`` パラメータは ``unicode()`` コンストラクタのパラメータと同様です 
が、
+もう一つ可能性が追加されています; 同様のものとして 'strict', 'ignore', そし 
て 'replace' があり、
+さらに XML 文字参照を使う 'xmlcharrefreplace' を渡すことができます::

      >>> u = unichr(40960) + u'abcd' + unichr(1972)
      >>> u.encode('utf-8')
@@ -341,8 +618,12 @@
      >>> u.encode('ascii', 'xmlcharrefreplace')
      '&#40960;abcd&#1972;'

-Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
-interprets the string using the given encoding::
+..
+  Python's 8-bit strings have a ``.decode([encoding], [errors])`` method  
that
+  interprets the string using the given encoding::
+
+Python の 8-bit 文字列は ``.decode([encoding], [errors])`` メソッドを持って 
います、
+これは与えたエンコーディングを使って文字列を解釈します::

      >>> u = unichr(40960) + u'abcd' + unichr(1972)   # Assemble a string
      >>> utf8_version = u.encode('utf-8')             # Encode as UTF-8
@@ -352,31 +633,60 @@
      >>> u == u2                                      # The two strings  
match
      True

-The low-level routines for registering and accessing the available  
encodings are
-found in the :mod:`codecs` module.  However, the encoding and decoding  
functions
-returned by this module are usually more low-level than is comfortable, so  
I'm
-not going to describe the :mod:`codecs` module here.  If you need to  
implement a
-completely new encoding, you'll need to learn about the :mod:`codecs`  
module
-interfaces, but implementing encodings is a specialized task that also  
won't be
-covered here.  Consult the Python documentation to learn more about this  
module.
-
-The most commonly used part of the :mod:`codecs` module is the
-:func:`codecs.open` function which will be discussed in the section on  
input and
-output.
-
-
-Unicode Literals in Python Source Code
---------------------------------------
-
-In Python source code, Unicode literals are written as strings prefixed  
with the
-'u' or 'U' character: ``u'abcdefghijk'``.  Specific code points can be  
written
-using the ``\u`` escape sequence, which is followed by four hex digits  
giving
-the code point.  The ``\U`` escape sequence is similar, but expects 8 hex
-digits, not 4.
-
-Unicode literals can also use the same escape sequences as 8-bit strings,
-including ``\x``, but ``\x`` only takes two hex digits so it can't express  
an
-arbitrary code point.  Octal escapes can go up to U+01ff, which is octal  
777.
+..
+  The low-level routines for registering and accessing the available  
encodings are
+  found in the :mod:`codecs` module.  However, the encoding and decoding  
functions
+  returned by this module are usually more low-level than is comfortable,  
so I'm
+  not going to describe the :mod:`codecs` module here.  If you need to  
implement a
+  completely new encoding, you'll need to learn about the :mod:`codecs`  
module
+  interfaces, but implementing encodings is a specialized task that also  
won't be
+  covered here.  Consult the Python documentation to learn more about this  
module.
+
+:mod:`codecs` モジュールに利用可能なエンコーディングを登録したり、アクセス 
する低レベルルーチンがあります。
+しかし、このモジュールが返すエンコーディングとデコーディング関数はふつう低 
レベルすぎて快適とはいえません、
+そのためここで :mod:`codecs` モジュールについて述べないことにします。
+もし、全く新しいエンコーディングを実装する必要があれば、
+:mod:`codecs` モジュールのインターフェースについて学ぶ必要があります、
+しかし、エンコーディングの実装は特殊な作業なので、ここでは扱いません。
+このモジュールについて学ぶには Python ドキュメントを参照して下さい。
+
+..
+  The most commonly used part of the :mod:`codecs` module is the
+  :func:`codecs.open` function which will be discussed in the section on  
input and
+  output.
+
+
+:mod:`codecs` モジュールの中で最も使われるのは :func:`codecs.open` 関数で 
す、
+この関数は入出力の節で議題に挙げます。
+
+..
+  Unicode Literals in Python Source Code
+  --------------------------------------
+
+Python ソースコード内の Unicode リテラル
+----------------------------------------
+
+..
+  In Python source code, Unicode literals are written as strings prefixed  
with the
+  'u' or 'U' character: ``u'abcdefghijk'``.  Specific code points can be  
written
+  using the ``\u`` escape sequence, which is followed by four hex digits  
giving
+  the code point.  The ``\U`` escape sequence is similar, but expects 8 hex
+  digits, not 4.
+
+Python のソースコード内では Unicode リテラルは 'u' または 'U' の文字を最初 
に付けた文字列として書かれます:
+``u'abcdefghijk'`` 。
+特定のコードポイントはエスケープシーケンス ``\u`` を使い、続けてコードポイ 
ントを4桁の16進数を書きます。
+エスケープシーケンス ``\U`` も同様です、ただし4桁ではなく8桁の16進数を使い 
ます。
+
+..
+  Unicode literals can also use the same escape sequences as 8-bit strings,
+  including ``\x``, but ``\x`` only takes two hex digits so it can't  
express an
+  arbitrary code point.  Octal escapes can go up to U+01ff, which is octal  
777.
+
+Unicode リテラルは 8-bit 文字列と同じエスケープシーケンスを使うことができま 
す、
+使えるエスケープシーケンスには ``\x`` も含みます、ただし ``\x`` は2桁の16進 
数しかとることができないので
+任意のコードポイントを表現することはできません。
+8進エスケープは8進数の777を示す U+01ff まで使うことができます。

  ::

@@ -388,20 +698,35 @@
      ...
      97 172 4660 8364 32768

-Using escape sequences for code points greater than 127 is fine in small  
doses,
-but becomes an annoyance if you're using many accented characters, as you  
would
-in a program with messages in French or some other accent-using language.   
You
-can also assemble strings using the :func:`unichr` built-in function, but  
this is
-even more tedious.
-
-Ideally, you'd want to be able to write literals in your language's natural
-encoding.  You could then edit Python source code with your favorite editor
-which would display the accented characters naturally, and have the right
-characters used at runtime.
-
-Python supports writing Unicode literals in any encoding, but you have to
-declare the encoding being used.  This is done by including a special  
comment as
-either the first or second line of the source file::
+..
+  Using escape sequences for code points greater than 127 is fine in small  
doses,
+  but becomes an annoyance if you're using many accented characters, as  
you would
+  in a program with messages in French or some other accent-using  
language.  You
+  can also assemble strings using the :func:`unichr` built-in function,  
but this is
+  even more tedious.
+
+127 より大きいコードポイントに対してエスケープシーケンスを使うのはあまり多 
くないうちは有効ですが、
+フランス語等のアクセントを使う言語でメッセージのような多くのアクセント文字 
を使う場合には邪魔になります。
+文字を :func:`unichr` 組み込み関数を使って組み上げることもできますが、それ 
はさらに長くなってしまうでしょう。
+
+..
+  Ideally, you'd want to be able to write literals in your language's  
natural
+  encoding.  You could then edit Python source code with your favorite  
editor
+  which would display the accented characters naturally, and have the right
+  characters used at runtime.
+
+理想的にはあなたの言語の自然なエンコーディングでリテラルを書くことでしょ 
う。
+そうなれば、Python のソースコードをアクセント付きの文字を自然に表示するお気 
に入りのエディタで編集し、
+実行時に正しい文字が得られます。
+
+..
+  Python supports writing Unicode literals in any encoding, but you have to
+  declare the encoding being used.  This is done by including a special  
comment as
+  either the first or second line of the source file::
+
+Python は Unicode 文字列を任意のエンコーディングで書くことができます、
+ただしどのエンコーディングを使うかを宣言しなければいけません。
+それはソースファイルの一行目や二行目に特別なコメントを含めることによってで 
きます::

      #!/usr/bin/env python
***The diff for this file has been truncated for email.***

==============================================================================
Revision: e94aea17f93c
Author:   Akihiro Uchida <uchid****@ike-d*****>
Date:     Fri May 20 00:15:24 2011
Log:      merge
http://code.google.com/p/python-doc-ja/source/detail?r=e94aea17f93c





Pythonjp-checkins メーリングリストの案内
アーカイブの一覧に戻る