Date: Saturday July 27, 2019 @ 05:58 Author: argrath Update of /cvsroot/perldocjp/docs/perl/5.12.1 In directory sf-cvs:/tmp/cvs-serv82633/perl/5.12.1 Modified Files: perlunicode.pod Log Message: 5.12.1/perlunicode =================================================================== File: perlunicode.pod Status: Up-to-date Working revision: 1.8 Fri Jul 26 20:58:46 2019 Repository revision: 1.8 /cvsroot/perldocjp/docs/perl/5.12.1/perlunicode.pod,v Sticky Options: -ko Existing Tags: No Tags Exist -------------- next part -------------- Index: docs/perl/5.12.1/perlunicode.pod diff -u docs/perl/5.12.1/perlunicode.pod:1.7 docs/perl/5.12.1/perlunicode.pod:1.8 --- docs/perl/5.12.1/perlunicode.pod:1.7 Tue Jul 23 04:55:14 2019 +++ docs/perl/5.12.1/perlunicode.pod Sat Jul 27 05:58:46 2019 @@ -1,4 +1,3 @@ - =encoding euc-jp =head1 NAME @@ -734,11 +733,10 @@ =end original -More formally, C<\p{Uppercase}> matches any character whose Unicode Uppercase -property value is True, and C<\P{Uppercase}> matches any character whose -Uppercase property value is False, and they could have been written as -C<\p{Uppercase=True}> and C<\p{Uppercase=False}>, respectively -(TBT) +¤è¤êÀµ¼°¤Ë¤Ï¡¢C<\p{Uppercase}> ¤Ï Unicode ¤Î Uppercase ÆÃÀÃÍ ¤¬ True ¤Ç¤¢¤ë +Ǥ°Õ¤Îʸ»ú¤È¥Þ¥Ã¥Á¥ó¥°¤·¡¢C<\P{UpperCase}>¤Ï UpperCase ÆÃÀÃÍ ¤¬ False ¤Ç¤¢¤ë +Ǥ°Õ¤Îʸ»ú¤È¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹; ¤½¤·¤Æ¤³¤ì¤é¤Ï¤½¤ì¤¾¤ì +C<\p{Uppercase=True}>, C<\p{Uppercase=False}> ¤È½ñ¤±¤Þ¤¹¡£ =begin original @@ -753,18 +751,15 @@ =end original -This formality is needed when properties are not binary, that is if they can -take on more values than just True and False. -For example, the Bidi_Class (see -L</"Bidirectional Character Types"> below), can take on a number of different -values, such as Left, Right, Whitespace, and others. -To match these, one needs -to specify the property name (Bidi_Class), and the value being matched against -(Left, Right, I<etc.>). -This is done, as in the examples above, by having the -two components separated by an equal sign (or interchangeably, a colon), like -C<\p{Bidi_Class: Left}>. -(TBT) +¤³¤Î·Á¼°¤Ï¡¢ÆÃÀ¤¬ 2 ÃͤǤʤ¤¾ì¹ç¡¢¤Ä¤Þ¤ê¡¢Ã±¤Ë True ¤È False ¤è¤ê¿¤¯¤Î +Ãͤò¼è¤ë¤³¤È¤¬¤Ç¤¤ë¾ì¹ç¤ËɬÍפǤ¹¡£ +¤¿¤È¤¨¤Ð¡¢Bidi_Class (L</"Bidirectional Character Types"> ¤ò»²¾È)¤Ï¡¢ +Left¡¢Right¡¢Whitespace ¤Ê¤É¤Î¤µ¤Þ¤¶¤Þ¤ÊÃͤò¼è¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£ +¤³¤ì¤é¤Ë¥Þ¥Ã¥Á¥ó¥°¤¹¤ë¤Ë¤Ï¡¢ÆÃÀ̾(Bidi_Class)¤È¡¢ +¥Þ¥Ã¥Á¥ó¥°¤¹¤ëÃÍ (Left¡¢Right ¤Ê¤É) ¤ò»ØÄꤹ¤ëɬÍפ¬¤¢¤ê¤Þ¤¹¡£ +¤³¤ì¤Ï¡¢Á°½Ò¤ÎÎã¤Î¤è¤¦¤Ë¡¢Æó¤Ä¤ÎÍ×ÁǥȤòÅù¹æ +(¤Þ¤¿¤Ï¡¢C<\p{Biddi_Class:Left}> ¤Î¤è¤¦¤Ë¸ò´¹²Äǽ¤Ê¥³¥í¥ó)¤Ç +¶èÀڤ뤳¤È¤Ë¤è¤Ã¤Æ¡¢¼Â¹Ô¤µ¤ì¤Þ¤¹¡£ =begin original @@ -777,13 +772,11 @@ =end original -All Unicode-defined character properties may be written in these compound forms -of C<\p{property=value}> or C<\p{property:value}>, but Perl provides some -additional properties that are written only in the single form, as well as -single-form short-cuts for all binary properties and certain others described -below, in which you may omit the property name and the equals or colon -separator. -(TBT) +¤¹¤Ù¤Æ¤Î Unicode ¤¬ÄêµÁ¤·¤¿Ê¸»úÆÃÀ¤Ï¡¢C<\p{property=value}> ¤ä +C<\p{property:value}> ¤Î¤è¤¦¤ÊÊ£¹ç·Á¼°¤Ç½ñ¤±¤Þ¤¹¤¬¡¢ +Perl ¤ÏÆÃÀ̾¤ª¤è¤ÓÅù¹æ¤ä¥³¥í¥ó¤Î¶èÀÚ¤êʸ»ú¤ò¾Êά¤Ç¤¤ë¤è¤¦¤Ë¡¢ +ñ°ì·Á¼°¤Ç¤Î¤ß½ñ¤±¤ëÄɲäÎÆÃÀ¤ä¡¢Á´¤Æ¤Î 2 ÃÍÆÃÀ¤È°ìÉô¤Î¸å½Ò¤¹¤ë +¤â¤Î¤ËÂФ¹¤ëñ°ì·Á¼°¤Î¥·¥ç¡¼¥È¥«¥Ã¥È¤òÄ󶡤·¤Þ¤¹¡£ =begin original @@ -803,26 +796,22 @@ =end original -Most Unicode character properties have at least two synonyms (or aliases if you -prefer), a short one that is easier to type, and a longer one which is more -descriptive and hence it is easier to understand what it means. -Thus the "L" -and "Letter" above are equivalent and can be used interchangeably. -Likewise, -"Upper" is a synonym for "Uppercase", and we could have written -C<\p{Uppercase}> equivalently as C<\p{Upper}>. -Also, there are typically -various synonyms for the values the property can be. -For binary properties, -"True" has 3 synonyms: "T", "Yes", and "Y"; and "False has correspondingly "F", -"No", and "N". -But be careful. -A short form of a value for one property may -not mean the same thing as the same short form for another. -Thus, for the General_Category property, "L" means "Letter", -but for the Bidi_Class property, "L" means "Left". -A complete list of properties and synonyms is in L<perluniprops>. -(TBT) +¤Û¤È¤ó¤É¤Î Unicode ʸ»úÆÃÀ¤Ë¤Ï¡¢¾¯¤Ê¤¯¤È¤âÆó¤Ä¤ÎƱµÁ¸ì +(¤Þ¤¿¤Ï¤¢¤Ê¤¿¤¬¹¥¤à¤Ê¤éÊÌ̾)¤¬¤¢¤ê¤Þ¤¹; ´Êñ¤ËÆþÎϤǤ¤ëû¤¤¤â¤Î¤È¡¢ +¤è¤êŤ¤¤±¤ì¤É¤âÀâÌÀŪ¤Ç°ÕÌ£¤¬Íý²ò¤·¤ä¤¹¤¤¤â¤Î¤Ç¤¹¡£ +¤·¤¿¤¬¤Ã¤Æ¡¢Á°½Ò¤Î "L"¤ª¤è¤Ó "Letter" ¤ÏƱÅù¤Ç¤¢¤ê¡¢ +¸ò´¹²Äǽ¤Ç¤¹¡£ +ƱÍͤˡ¢"Upper" ¤Ï "Uppercase" ¤ÎƱµÁ¸ì¤Ç¤¢¤ê¡¢C<\p{Uppercase}> ¤Ï +Åù²Á¤Ë C<\p{Upper}> ¤È½ñ¤±¤Þ¤¹¡£ +¤Þ¤¿¡¢Åµ·¿Åª¤Ë¤ÏÆÃÀ¤ÎÃͤËÂФ·¤Æ¤µ¤Þ¤¶¤Þ¤ÊƱµÁ¸ì¤¬¤¢¤ê¤Þ¤¹¡£ +2 ÃÍÆÃÀ¤Î¾ì¹ç¡¢"True" ¤Ë¤Ï»°¤Ä¤ÎƱµÁ¸ì¤¬¤¢¤ê¤Þ¤¹: "T", "Yes", "Y"; +"False" ¤Ë¤Ï "F", "No", "N" ¤¬¤¢¤ê¤Þ¤¹¡£ +¤·¤«¤·Ãí°Õ¤·¤Æ¤¯¤À¤µ¤¤¡£ +¤¢¤ëÆÃÀ¤ËÂФ¹¤ëÃͤÎû¤¤·Á¼°¤Ï¡¢Â¾¤ÎÆÃÀ¤ÎƱ¤¸Ã»¤¤·Á¼°¤ÈƱ¤¸¤â¤Î¤ò +°ÕÌ£¤¹¤ë¤È¤Ï¸Â¤ê¤Þ¤»¤ó¡£ +½¾¤Ã¤Æ¡¢General_Category ÆÃÀ¤Ç¤Ï "L" ¤Ï "Letter" ¤ò°ÕÌ£¤·¤Þ¤¹¤¬¡¢ +Bidi_Class ÆÃÀ¤Ç¤Ï¡¢"L" ¤Ï "Left" ¤ò°ÕÌ£¤·¤Þ¤¹¡£ +ÆÃÀ¤ª¤è¤ÓƱµÁ¸ì¤Î´°Á´¤Ê°ìÍ÷¤Ï L<perluniprops> ¤Ë¤¢¤ê¤Þ¤¹¡£ =begin original @@ -842,25 +831,20 @@ =end original -Upper/lower case differences in the property names and values are irrelevant, -thus C<\p{Upper}> means the same thing as C<\p{upper}> or even C<\p{UpPeR}>. -Similarly, you can add or subtract underscores anywhere in the middle of a -word, so that these are also equivalent to C<\p{U_p_p_e_r}>. -And white space is irrelevant adjacent to non-word characters, -such as the braces and the equals or colon separators -so C<\p{ Upper }> and C<\p{ Upper_case : Y }> are -equivalent to these as well. -In fact, in most cases, white space and even -hyphens can be added or deleted anywhere. -So even C<\p{ Up-per case = Yes}> is equivalent. -All this is called "loose-matching" by Unicode. -The few places where stricter matching is employed is -in the middle of numbers, and the Perl extension properties that -begin or end with an underscore. -Stricter matching -cares about white space (except adjacent to the non-word characters) and -hyphens, and non-interior underscores. -(TBT) +ÆÃÀ̾¤ÈÃͤÎÂçʸ»ú¤È¾®Ê¸»ú¤Î°ã¤¤¤Ï̵´Ø·¸¤Ç¤¹; +¤·¤¿¤¬¤Ã¤Æ C<\p{Upper}> ¤Ï C<\p{upper}>, ¤µ¤é¤Ë¤Ï C<\p{UpPeR}> ¤È¤âƱ¤¸¤³¤È¤ò +°ÕÌ£¤·¤Þ¤¹¡£ +ƱÍͤˡ¢Ã±¸ì¤ÎÃæ¤Î¤É¤³¤Ë¤Ç¤â²¼Àþ¤òÄɲäޤ¿¤Ïºï½ü¤Ç¤¤ë¤Î¤Ç¡¢ +¤³¤ì¤é¤Ï C<\p{U_p_p_e_r}> ¤È¤âÅù²Á¤Ç¤¹¡£ +¤Þ¤¿¡¢Ã椫¤Ã¤³¤äÅù¹æ¡¢¥³¥í¥ó¤Ê¤É¤ÎÈóñ¸ìʸ»ú¤ËÎÙÀܤ·¤¿¶õÇò¤Ï̵»ë¤µ¤ì¤ë¤Î¤Ç¡¢ +C<\p{ Upper }> and C<\p{ Upper_case : Y }> ¤âÅù²Á¤Ç¤¹¡£ +¼ÂºÝ¤Ë¤Ï¡¢¤Û¤È¤ó¤É¤Î¾ì¹ç¡¢¶õÇò¤È¥Ï¥¤¥Õ¥ó¤µ¤¨¤É¤³¤Ë¤Ç¤âÄɲäޤ¿¤Ïºï½ü¤Ç¤¤Þ¤¹¡£ +¤·¤¿¤¬¤Ã¤Æ¡¢C<\p{Upper case=Yes}> ¤Ç¤¹¤é¤âÅù²Á¤Ç¤¹¡£ +¤³¤ì¤Ï¤¹¤Ù¤Æ Unicode ¤Ç¡Ö´Ë¤¤¥Þ¥Ã¥Á¥ó¥°¡×¤È¸Æ¤Ð¤ì¤Þ¤¹¡£ +¿ô¾¯¤Ê¤¤¸·Ì©¤Ê¥Þ¥Ã¥Á¥ó¥°¤¬ºÎÍѤµ¤ì¤Æ¤¤¤ë¾ì½ê¤Ï¿ôÃͤÎÃæ¤È¡¢²¼Àþ¤Ç»Ï¤Þ¤Ã¤¿¤ê +½ª¤ï¤Ã¤¿¤ê¤¹¤ë Perl ³ÈÄ¥ÆÃÀ¤Ç¤¹¡£ +¤è¤ê¸·Ì©¤Ê¥Þ¥Ã¥Á¥ó¥°¤Ï¶õÇò¥¹(Èóñ¸ìʸ»ú¤ËÎÙÀܤ¹¤ë¤â¤Î¤ò½ü¤¯)¤È¥Ï¥¤¥Õ¥ó¡¢ +¤ª¤è¤ÓÈóÆâÉô²¼Àþ¤ò¹Íθ¤·¤Þ¤¹¡£ =begin original @@ -1049,12 +1033,11 @@ =end original -The world's languages are written in a number of scripts. -This sentence (unless you're reading it in translation) is written in Latin, -while Russian is written in Cyrllic, and Greek is written in, well, Greek; -Japanese mainly in Hiragana or Katakana. -There are many more. -(TBT) +À¤³¦¤Î¸À¸ì¤ÏÍÍ¡¹¤ÊÍÑ»ú¤Ç½ñ¤«¤ì¤Æ¤¤¤Þ¤¹¡£ +¤³¤Îʸ¤Ï(Ìõʸ¤òÆɤó¤Ç¤¤¤Ê¤¤¸Â¤ê)¥é¥Æ¥óʸ»ú¤Ç½ñ¤«¤ì¤Æ¤¤¤Þ¤¹¤¬¡¢¥í¥·¥¢¸ì¤Ï +¥¥ê¥ëʸ»ú¤Ç½ñ¤«¤ì¤Æ¤¤¤Þ¤¹; ¤½¤·¤Æ¥®¥ê¥·¥ã¸ì¤Ï¡¢¤¨¤¨¤È¡¢¥®¥ê¥·¥ãʸ»ú¤Ç¤¹; +ÆüËܸì¤Ï¼ç¤Ë¤Ò¤é¤¬¤Ê¤ä¥«¥¿¥«¥Ê¤Ç½ñ¤«¤ì¤Æ¤¤¤Þ¤¹¡£ +¤â¤Ã¤È¤¿¤¯¤µ¤ó¤¢¤ê¤Þ¤¹¡£ =begin original @@ -1066,13 +1049,12 @@ =end original -The Unicode Script property gives what script a given character is in, -and can be matched with the compound form like C<\p{Script=Hebrew}> (short: -C<\p{sc=hebr}>). -Perl furnishes shortcuts for all script names. -You can omit everything up through the equals (or colon), -and simply write C<\p{Latin}> or C<\P{Cyrillic}>. -(TBT) +Unicode ScriptÆÃÀ¤Ï¡¢»ØÄꤵ¤ì¤¿Ê¸»ú¤ÎÃæ¤Ë¤¢¤ëÍÑ»ú¤ò¼¨¤·¡¢ +C<\p{Script=Hebrew}> (û½Ì: C<\p{sc=hebr}>) ¤Î¤è¤¦¤ÊÊ£¹ç·Á¼°¤Ç +¥Þ¥Ã¥Á¥ó¥°¤µ¤»¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£ +Perl¤Ï¡¢¤¹¤Ù¤Æ¤ÎÍÑ»ú̾¤Î¥·¥ç¡¼¥È¥«¥Ã¥È¤òÄ󶡤·¤Þ¤¹¡£ +Åù¹æ(¤Þ¤¿¤Ï¥³¥í¥ó)¤Þ¤Ç¤Î¤¹¤Ù¤Æ¤ò¾Êά¤Ç¤¤Þ¤¹; +¤½¤·¤Æñ¤Ë C<\p{Latin}> ¤ä C<\P{Cyrillic}> ¤È½ñ¤±¤Þ¤¹¡£ =begin original @@ -1178,7 +1160,7 @@ ÄêµÁ¤·¤¿Ã»¤¤Ì¾Á°¤ò»ý¤Á¤Þ¤¹¡£ ¤·¤«¤· Perl ¤Ï(¿¾¯¤Î)¥·¥ç¡¼¥È¥«¥Ã¥È¤òÄ󶡤·¤Þ¤¹: Î㤨¤Ð C<\p{In_Arrows}> ¤ä C<\p{In_Hebrew}> ¤Î¤è¤¦¤Ë½ñ¤±¤Þ¤¹¡£ -¸åÊý¸ß´¹À¤Î¤¿¤á¤Ë¡¢C<In> ÀÜƬ¼¤ÏÍÑ»ú¤ä¾¤Î¥×¥í¥Ñ¥Æ¥£¤È¾×Æͤ·¤Ê¤±¤ì¤Ð +¸åÊý¸ß´¹À¤Î¤¿¤á¤Ë¡¢C<In> ÀÜƬ¼¤ÏÍÑ»ú¤ä¾¤ÎÆÃÀ¤È¾×Æͤ·¤Ê¤±¤ì¤Ð ¾Êά¤¹¤ë¤³¤È¤â²Äǽ¤Ç¤¹¤·¡¢¤³¤Î¤è¤¦¤Ê¾ì¹ç¤Ç C<Is> ÀÜƬ¼¤ò»È¤¦¤³¤È¤â¤Ç¤¤Þ¤¹¡£ ¤·¤«¤·¤½¤¦¤¹¤ë¤Î¤Ï¤¤¤¤¹Í¤¨¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó; ¤¤¤¯¤Ä¤«¤ÎÍýͳ¤¬¤¢¤ê¤Þ¤¹: @@ -1260,20 +1242,18 @@ Unicode defines all its properties in the compound form, so all single-form properties are Perl extensions. A number of these are just synonyms for the -Unicode ones, but some are genunine extensions, including a couple that are in +Unicode ones, but some are genuine extensions, including a couple that are in the compound form. And quite a few of these are actually recommended by Unicode (in L<http://www.unicode.org/reports/tr18>). =end original -Unicode defines all its properties in the compound form, so all single-form -properties are Perl extensions. -A number of these are just synonyms for the -Unicode ones, but some are genunine extensions, including a couple that are in -the compound form. -And quite a few of these are actually recommended by Unicode -(in L<http://www.unicode.org/reports/tr18>). -(TBT) +Unicode ¤Ï¡¢Ê£¹ç·Á¼°¤Ç¤¹¤Ù¤Æ¤ÎÆÃÀ¤òÄêµÁ¤¹¤ë¤Î¤Ç¡¢ +ñ°ì·Á¼°¤ÎÆÃÀ¤Ï¤¹¤Ù¤Æ Perl ³ÈÄ¥¤Ë¤Ê¤ê¤Þ¤¹¡£ +¤³¤ì¤é¤Î¿¤¯¤Ï Unicode ¤Î¤â¤ÎƱµÁ¸ì¤Ë¤¹¤®¤Þ¤»¤ó¤¬¡¢¤¤¤¯¤Ä¤«¤Ï +ËÜʪ¤Î³ÈÄ¥¤Ç¤¢¤ê¡¢Ê£¹ç·Á¼°¤Î¤â¤Î¤â¤¢¤ê¤Þ¤¹¡£ +¤½¤·¤Æ¤³¤ì¤é¤Î¤¤¤¯¤Ä¤«¤Ï¼ÂºÝ¤Ë Unicode +(L<http://www.unicode.org/reports/tr18>)¤Ç¿ä¾©¤µ¤ì¤Æ¤¤¤Þ¤¹¡£ =begin original @@ -1283,10 +1263,9 @@ =end original -This section gives some details on all the extensions that aren't synonyms for -compound-form Unicode properties (for those, you'll have to refer to the -L<Unicode Standard|http://www.unicode.org/reports/tr44>. -(TBT) +¤³¤ÎÀá¤Ç¤Ï¡¢Ê£¹ç·Á¼°¤ÎUnicode ÆÃÀ¤ÎƱµÁ¸ì¤Ç¤Ï¤Ê¤¤¤¹¤Ù¤Æ¤Î³ÈÄ¥µ¡Ç½¤Ë¤Ä¤¤¤Æ +¾Ü¤·¤¯ÀâÌÀ¤·¤Þ¤¹ (¤³¤ì¤é¤Îµ¡Ç½¤Ë¤Ä¤¤¤Æ¤Ï¡¢ +L<Unicode Standard http://www.unicode.org/reports/tr44>¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤)¡£ =over @@ -1409,18 +1388,16 @@ =end original -But Unicode's intent is to unify the existing character set standards and -practices, and a number of pre-existing standards have single characters that -mean the same thing as some of these combinations. -An example is ISO-8859-1, -which has quite a few of these in the Latin-1 range, an example being "LATIN -CAPITAL LETTER E WITH ACUTE". -Because this character was in this pre-existing -standard, Unicode added it to its repertoire. -But this character is considered -by Unicode to be equivalent to the sequence consisting of first the character -"LATIN CAPITAL LETTER E", then the character "COMBINING ACUTE ACCENT". -(TBT) +¤·¤«¤· Unicode ¤Î°Õ¿Þ¤Ï´û¸¤Îʸ»ú½¸¹çɸ½à¤È´·½¬¤òÅý°ì¤¹¤ë¤³¤È¤Ç¤¢¤ê¡¢ +´û¸¤Î¤¤¤¯¤Ä¤«¤Îɸ½àµ¬³Ê¤Ë¤Ï¡¢¤³¤ì¤é¤ÎÁȤ߹ç¤ï¤»¤ÈƱ¤¸¤â¤Î¤ò°ÕÌ£¤¹¤ë +ñ°ìʸ»ú¤¬´Þ¤Þ¤ì¤Æ¤¤¤Þ¤¹¡£ +Îã¤È¤·¤Æ¡¢ISO-8859-1¤¬¤¢¤ê¤Þ¤¹¡£¤³¤ÎÎã¤Ç¤Ï¡¢Latin-1¤ÎÈϰϤ¬Èó¾ï¤Ë¿¤¯¡¢ +"LATIN CAPITAL LETTER E WITH ACUTE" ¤È¸Æ¤Ð¤ì¤ëÎ㤬¤¢¤ê¤Þ¤¹¡£ +¤³¤Îʸ»ú¤Ï´û¸¤Îɸ½à¤Ë´Þ¤Þ¤ì¤Æ¤¤¤¿¤¿¤á¡¢Unicode ¤Ï¤½¤ì¤ò¥ì¥Ñ¡¼¥È¥ê¡¼¤Ë +Äɲä·¤Þ¤·¤¿¡£ +¤·¤«¤·¤³¤Îʸ»ú¤Ï¡¢Ê¸»ú "LATIN CAPITAL LETTER E" ¤Èʸ»ú +"COMBINING ACUTE ACCENT" ¤«¤é¤Ê¤ëʤӤÈÅù²Á¤Ç¤¢¤ë¤È +Unicode ¤Ë¤è¤Ã¤Æ¹Í¤¨¤é¤ì¤Æ¤¤¤Þ¤¹¡£ =begin original @@ -1431,12 +1408,9 @@ =end original -"LATIN CAPITAL LETTER E WITH ACUTE" is called a "pre-composed" character, and -the equivalence with the sequence is called canonical equivalence. -All pre-composed characters are said to have a decomposition -(into the equivalent sequence) and -the decomposition type is also called canonical. -(TBT) +"LATIN CAPITAL LETTER E WITH ACUTE" ¤Ï¡Ö¹çÀ®ºÑ¡×(pre-composed) ʸ»ú¤È¸Æ¤Ð¤ì¡¢ +Åù²Á¤ÊʤӤÏÀµ½àÅù²Á (canonical equivalence) ¤È¸Æ¤Ð¤ì¤Þ¤¹¡£ +Á´¤Æ¤Î¹çÀ®ºÑʸ»ú¤Ï(Åù²Á¤ÊʤӤË)ʬ²ò¤Ç¤¡¢Ê¬²ò¤Î¼ïÎà¤â¤Þ¤¿Àµ½à¤È¸Æ¤Ð¤ì¤Þ¤¹¡£ =begin original @@ -1453,19 +1427,17 @@ =end original -However, many more characters have a different type of decomposition, -a "compatible" or "non-canonical" decomposition. -The sequences that form these decompositions are not -considered canonically equivalent to the pre-composed character. -An example, again in the Latin-1 range, is the "SUPERSCRIPT ONE". -It is kind of like a regular digit 1, but not exactly; -its decomposition into the digit 1 is called a "compatible" decomposition, -specifically a "super" decomposition. -There are several such compatibility -decompositions (see L<http://www.unicode.org/reports/tr44>), including one -called "compat" which means some miscellaneous type of decomposition -that doesn't fit into the decomposition categories that Unicode has chosen. -(TBT) +¤·¤«¤·¡¢Â¿¤¯¤Îʸ»ú¤Ï°Û¤Ê¤ë¼ïÎà¤Îʬ²ò¤ò»ý¤Á¡¢ +¡Ö¸ß´¹¡×ʬ²ò¤¢¤ë¤¤¤Ï¡ÖÈóÀµ½à¡×ʬ²ò¤È¸Æ¤Ð¤ì¤Þ¤¹¡£ +¤³¤ì¤é¤Îʬ²ò¤ò·ÁÀ®¤¹¤ëʤӤϹçÀ®ºÑʸ»ú¤Ø¤ÎÀµ½àÅù²Á¤Ç¤Ï¤Ê¤¤¤È¹Í¤¨¤é¤ì¤Þ¤¹¡£ +Î㤨¤Ð¡¢ºÆ¤Ó Latin-1 ¤ÎÈÏ°Ï¤Ç¤Ï "SUPERSCRIPT ONE" ¤Ç¤¹¡£ +¤³¤ì¤ÏÉáÄ̤οô»ú 1 ¤Î¤è¤¦¤Ê¤â¤Î¤Ç¤¹¤¬¡¢Àµ³Î¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó; +¤³¤ì¤Î¿ô»ú 1 ¤Ø¤Îʬ²ò¤Ï +¡Ö¸ß´¹¡×ʬ²ò¤È¸Æ¤Ð¤ì¡¢Æäˡ֥¹¡¼¥Ñ¡¼¡×ʬ²ò¤È¸Æ¤Ð¤ì¤Þ¤¹¡£ +¤³¤Î¤è¤¦¤Ê¸ß´¹Ê¬²ò(L<http://www.unicode.org/reports/tr44>¤ò»²¾È)¤Ï +¤¤¤¯¤Ä¤«¤¢¤ê¤Þ¤¹; "compat" ¤È¸Æ¤Ð¤ì¤ë¡¢ +Unicode¤¬ ÁªÂò¤·¤¿Ê¬²ò¥«¥Æ¥´¥ê¤Ë¼ý¤Þ¤é¤Ê¤¤¡¢ÍÍ¡¹¤Êʬ²ò¤ò°ÕÌ£¤¹¤ë¤â¤Î¤â +¤¢¤ê¤Þ¤¹¡£ =begin original @@ -1725,14 +1697,12 @@ =end original -For example, C<U+0041> "LATIN CAPITAL LETTER A" was present in the very first -Unicode release available, which is C<1.1>, -so this property is true for all -valid "*" versions. -On the other hand, C<U+1EFF> was not assigned until version -5.1 when it became "LATIN SMALL LETTER Y WITH LOOP", -so the only "*" that would match it are 5.1, 5.2, and later. -(TBT) +¤¿¤È¤¨¤Ð¡¢C<U+0041> "LATIN CAPITAL LETTER A" ¤Ï¡¢»ÈÍѲÄǽ¤Ê +ºÇ½é¤Î Unicode ¥ê¥ê¡¼¥¹¤Ç¤¢¤ë C<1.1> ¤«¤é¸ºß¤·¤Æ¤¤¤ë¤Î¤Ç¡¢ +¤³¤ÎÆÃÀ¤Ï¤¹¤Ù¤Æ¤Î͸ú¤Ê "*" ¥Ð¡¼¥¸¥ç¥ó¤ËÂФ·¤Æ¿¿¤Ç¤¹¡£ +°ìÊý¡¢C<U+1eff> ¤Ï¡¢¤³¤ì¤¬ "LATIN SMALL LETTER Y WITH LOOP" ¤Ë¤Ê¤Ã¤¿ +¥Ð¡¼¥¸¥ç¥ó 5.1 ¤Þ¤Ç³ä¤êÅö¤Æ¤é¤ì¤Æ¤¤¤Ê¤«¤Ã¤¿¤Î¤Ç¡¢ +¤³¤ì¤Ë¥Þ¥Ã¥Á¥ó¥°¤¹¤ë "*" ¤Ï 5.1, 5.2, ¤ª¤è¤Ó¤½¤ì°Ê¹ß¤Ç¤¹¡£ =begin original @@ -1744,13 +1714,12 @@ =end original -Unicode furnishes the C<Age> property from which this is derived. -The problem with Age is that a strict interpretation of it -(which Perl takes) has -it matching the precise release a code point's meaning is introduced in. -Thus C<U+0041> would match only 1.1; and C<U+1EFF> only 5.1. -This is not usually what you want. -(TBT) +Unicode¤Ï¡¢C<Age> ÆÃÀ¤ò¡¢¤³¤ì¤«¤éÇÉÀ¸¤·¤¿¤â¤Î¤«¤éÄ󶡤·¤Þ¤¹¡£ +Age ¤ÎÌäÂê¤Ï¡¢(Perl ¤¬¹Ô¤¦) ¸·Ì©¤Ê²ò¼á¤Ë¤è¤Ã¤Æ¡¢Éä¹æ°ÌÃ֤Π+°ÕÌ£¤¬Æ³Æþ¤µ¤ì¤¿Àµ³Î¤Ê¥ê¥ê¡¼¥¹¤È°ìÃפ¹¤ë¤³¤È¤Ç¤¹¡£ +¤·¤¿¤¬¤Ã¤Æ¡¢C<U+0041> ¤Ï¡¢1.1 ¤Î¤ß¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¡¢C<U+1eff> ¤Ï 5.1 ¤È¤Î¤ß +¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹¡£ +¤³¤ì¤ÏÄ̾¤¢¤Ê¤¿¤¬Ë¾¤à¤â¤Î¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó¡£ =begin original @@ -1759,9 +1728,8 @@ =end original -Some non-Perl implementations of the Age property may change its meaning to be -the same as the Perl Present_In property; just be aware of that. -(TBT) +Age ÆÃÀ¤ÎÈó Perl ¼ÂÁõ¤ÎÃæ¤Ë¤Ï¡¢Perl ¤Î Present_In ÆÃÀ¤È +Ʊ¤¸°ÕÌ£¤ò»ý¤Ä¤è¤¦¤ËÊѹ¹¤·¤Æ¤¤¤ë¤â¤Î¤¬¤¢¤ê¤Þ¤¹; ÃΤäƤª¤¤¤Æ¤¯¤À¤µ¤¤¡£ =begin original @@ -1776,17 +1744,15 @@ =end original -Another confusion with both these properties is that the definition is not -that the code point has been assigned, but that the meaning of the code point -has been determined. -This is because 66 code points will always be -unassigned, and, so the Age for them is the Unicode version the decision to -make them so was made in. -For example, C<U+FDD0> is to be permanently -unassigned to a character, and the decision to do that was made in version 3.1, -so C<\p{Age=3.1}> matches this character and C<\p{Present_In: 3.1}> and up -matches as well. -(TBT) +¤³¤ì¤é¤ÎÆÃÀ¤Ë´Ø¤¹¤ë¤â¤¦°ì¤Ä¤Îº®Íð¤Ï¡¢ÄêµÁ¤Ï +¤³¤ÎÉä¹æ°ÌÃÖ¤¬³ä¤êÅö¤Æ¤é¤ì¤¿¤È¤¤¤¦¤³¤È¤Ç¤Ï¤Ê¤¯¡¢ +Éä¹æ°ÌÃ֤ΰÕÌ£¤¬·èÄꤵ¤ì¤¿¤È¤¤¤¦¤³¤È¤Ç¤¹¡£ +¤³¤ì¤Ï¡¢66 ¤ÎÉä¹æ°ÌÃÖ¤¬¾ï¤Ë³ä¤êÅö¤Æ¤é¤ì¤Ê¤¯¤Ê¤ê¡¢ +¤½¤ì¤é¤ËÂФ¹¤ë Age ¤Ï¤½¤¦·èÄꤵ¤ì¤¿ Unicode ¤Î¥Ð¡¼¥¸¥ç¥ó¤À¤«¤é¤Ç¤¹¡£ +¤¿¤È¤¨¤Ð¡¢C<U+FDD0> ¤Ï±Ê³Ū¤Ëʸ»ú¤¬³ä¤êÅö¤Æ¤é¤ì¤Ê¤¤¤³¤È¤Ê¤Ã¤Æ¤¤¤Æ¡¢ +¤³¤Î·èÄê¤Ï¥Ð¡¼¥¸¥ç¥ó 3.1 ¤Ç¹Ô¤ï¤ì¤¿¤Î¤Ç¡¢ +¤·¤¿¤¬¤Ã¤Æ C<\p{Age=3.1}> ¤Ï¤³¤Îʸ»ú¤Ë¥Þ¥Ã¥Á¥ó¥°¤·¡¢ +C<\p{\p{Present_In:3.1} °Ê¾å¤â¥Þ¥Ã¥Á¥ó¥°¤·¤Þ¤¹¡£ =item B<C<\p{Print}>> @@ -1855,7 +1821,7 @@ =end original -¤¢¤Ê¤¿¼«¿È¤Î¥Ð¥¤¥Ê¥êʸ»úÆÃÀ¤ò¡¢"In" ¤Þ¤¿¤Ï "Is" ¤Ç»Ï¤Þ¤ë̾Á°¤Î¥µ¥Ö¥ë¡¼¥Á¥ó¤ò +¤¢¤Ê¤¿¼«¿È¤Î 2 ÃÍʸ»úÆÃÀ¤ò¡¢"In" ¤Þ¤¿¤Ï "Is" ¤Ç»Ï¤Þ¤ë̾Á°¤Î¥µ¥Ö¥ë¡¼¥Á¥ó¤ò ÄêµÁ¤¹¤ë¤³¤È¤Ë¤è¤Ã¤Æ»ý¤Ä¤³¤È¤¬¤Ç¤¤Þ¤¹¡£ ¤½¤Î¥µ¥Ö¥ë¡¼¥Á¥ó¤ÏǤ°Õ¤Î¥Ñ¥Ã¥±¡¼¥¸¤ÇÄêµÁ¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£ ¥æ¡¼¥¶¡¼ÄêµÁÆÃÀ¤ÏÀµµ¬É½¸½¤Î C<\p> ¹½Â¤¤ä C<\P> ¹½Â¤¤Ç»È¤¦¤³¤È¤¬¤Ç¤¤Þ¤¹; @@ -3142,12 +3108,13 @@ =end original -This anomaly stems from Perl's attempt to not disturb older programs that -didn't use Unicode, and hence had no semantics for characters outside of the -ASCII range (except in a locale), along with Perl's desire to add Unicode -support seamlessly. -The result wasn't seamless: these characters were orphaned. -(TBT) +¤³¤Î°Û¾ï¤Ï¡¢ +Unicode ¤ò»ÈÍѤ·¤Æ¤¤¤Ê¤¤¡¢¤Ä¤Þ¤ê (¥í¥±¡¼¥ë¤ò½ü¤¤¤Æ) ASCII ¤ÎÈϰϳ°¤Î +ʸ»ú¤Ë¤Ä¤¤¤Æ°ÕÌ£ÏÀ¤ò»ý¤¿¤Ê¤¤ +¸Å¤¤¥×¥í¥°¥é¥à¤ò˸³²¤·¤Ê¤¤¤è¤¦¤Ë¤·¤è¤¦Perl ¤Î»î¤ß¤È¡¢ +Unicode Âбþ¤ò¥·¡¼¥à¥ì¥¹¤ËÄɲ䷤褦¤È¤¹¤ë Perl ¤Î +´ê˾¤Ë¤è¤ë¤â¤Î¤Ç¤¹¡£ +¤½¤Î·ë²Ì¤Ï¥·¡¼¥à¥ì¥¹¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó: ¤³¤ì¤é¤Îʸ»ú¤Ï¸ÉΩ¤·¤Æ¤¤¤Þ¤·¤¿¡£ =begin original @@ -3160,14 +3127,15 @@ =end original -Work is being done to correct this, -but only some of it was complete in time for the 5.12 release. -What has been finished is the important part of the case changing component. -Due to concerns, and some evidence, that older code might -have come to rely on the existing behavior, the new behavior must be explicitly -enabled by the feature C<unicode_strings> in the L<feature> pragma, -even though no new syntax is involved. -(TBT) +¤³¤ì¤ò½¤Àµ¤¹¤ëºî¶È¤¬¹Ô¤ï¤ì¤Æ¤¤¤Þ¤¹¤¬¡¢5.12 ¥ê¥ê¡¼¥¹¤Ë´Ö¤Ë¹ç¤¦¤è¤¦¤Ë +´°Î»¤·¤¿¤â¤Î¤Ï¤½¤Î°ìÉô¤À¤±¤Ç¤¹¡£ +´°Î»¤·¤¿¤Î¤Ï¡¢Âçʸ»ú¾®Ê¸»úÊÑ´¹¤Î½ÅÍפÊÉôʬ¤Ç¤¹¡£ +·üÇ°¤¬¤¢¤ê¡¢¤¤¤¯¤Ä¤«¤Î¾Úµò¤¬¤¢¤ê¤Þ¤¹¤¬¡¢ +¸Å¤¤¥³¡¼¥É¤¬´û¸¤Î¿¶¤ëÉñ¤¤¤Ë°Í¸¤·¤Æ¤¤¤ë¤«¤â¤·¤ì¤Ê¤¤¤È¤¤¤¦ +·üÇ°¤ª¤è¤Ó¤¤¤¯¤Ä¤«¤Î¾Úµò¤Ë¤è¤ê¡¢ +Î㤨¿·¤·¤¤¹½Ê¸¤¬´ØÍ¿¤·¤Æ¤¤¤Ê¤¯¤Æ¤â¡¢ +¿·¤·¤¤¿¶¤ëÉñ¤¤¤Ï L<feature> ¥×¥é¥°¥Þ¤Î C<unicode_strings> µ¡Ç½¤Ë¤è¤Ã¤Æ +ÌÀ¼¨Åª¤Ë͸ú¤Ë¤µ¤ì¤Æ¤¤¤ëɬÍפ¬¤¢¤ê¤Þ¤¹¡£ =begin original @@ -3178,11 +3146,11 @@ =end original -See L<perlfunc/lc> for details on how this pragma works in combination with -various others for casing. Even though the pragma only affects casing -operations in the 5.12 release, it is planned to have it affect all the -problematic behaviors in later releases: you can't have one without them all. -(TBT) +¤³¤Î¥×¥é¥°¥Þ¤¬¤µ¤Þ¤¶¤Þ¤Ê¥±¡¼¥¹¤ÈÁȤ߹ç¤ï¤»¤ÆÆ°ºî¤¹¤ëÊýË¡¤Î +¾ÜºÙ¤Ë¤Ä¤¤¤Æ¤Ï¡¢L<perlfunc/lc> ¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤¡£ +¥×¥é¥°¥Þ¤Ï 5.12 ¥ê¥ê¡¼¥¹¤Ç¤ÏÂçʸ»ú¾®Ê¸»úÁàºî¤Ë¤·¤«±Æ¶Á¤·¤Þ¤»¤ó¤¬¡¢ +¸å¤Î¥ê¥ê¡¼¥¹¤Ç¤Ï¤¹¤Ù¤Æ¤ÎÌäÂê¤Î¤¢¤ë¿¶¤ëÉñ¤¤¤Ë±Æ¶Á¤¹¤ë¤³¤È¤¬·×²è¤µ¤ì¤Æ¤¤¤Þ¤¹: +¤³¤ì¤é¤¹¤Ù¤Æ¤ò»È¤ï¤º¤Ë°ì¤Ä¤À¤±»È¤¦¤³¤È¤Ï¤Ç¤¤Þ¤»¤ó¡£ =begin original @@ -3510,13 +3478,13 @@ =end original -Download the files in the version of Unicode that you want from the Unicode web -site L<http://www.unicode.org>). -These should replace the existing files in C<\$Config{privlib}>/F<unicore>. -(C<\%Config> is available from the Config module.) -Follow the instructions in F<README.perl> in that directory to change -some of their names, and then run F<make>. -(TBT) +Unicode ¤Î Web ¥µ¥¤¥È L<http://www.unicode.org> ¤«¤é¡¢ÌÜŪ¤Î Unicode +¥Ð¡¼¥¸¥ç¥ó¤Î¥Õ¥¡¥¤¥ë¤ò¥À¥¦¥ó¥í¡¼¥É¤·¤Þ¤¹¡£ +¤³¤ì¤é¤Î¥Õ¥¡¥¤¥ë¤Ï¡¢C<\$Config{privlib}>/F<unicore> ¤Î´û¸¤Î¥Õ¥¡¥¤¥ë¤ò +ÃÖ¤´¹¤¨¤ëɬÍפ¬¤¢¤ê¤Þ¤¹¡£ +(C<\%Config> ¤Ï¡¢Config ¥â¥¸¥å¡¼¥ë¤«¤éÍøÍѤǤ¤Þ¤¹¡£) +°ìÉô¤Î̾Á°¤òÊѤ¨¤ë¤Ë¤Ï¡¢¤½¤Î¥Ç¥£¥ì¥¯¥È¥ê¤Ë¤¢¤ë F<README.perl> ¤Î»Ø¼¨¤Ë½¾¤Ã¤Æ¡¢ +¤½¤ì¤«¤é F<make> ¤ò¼Â¹Ô¤·¤Æ¤¯¤À¤µ¤¤¡£ =begin original @@ -3529,13 +3497,12 @@ =end original -It is even possible to download them to a different directory, and then change -F<utf8_heavy.pl> in the directory C<\$Config{privlib}> to point to the new -directory, or maybe make a copy of that directory before making the change, and -using C<@INC> or the C<-I> run-time flag to switch between versions at will -(but because of caching, not in the middle of a process), but all this is -beyond the scope of these instructions. -(TBT) +¤³¤ì¤òÊ̤Υǥ£¥ì¥¯¥È¥ê¤Ë¥À¥¦¥ó¥í¡¼¥É¤¹¤ë¤³¤È¤â¤Ç¤¤Þ¤¹¤¬¡¢ +¥Ç¥£¥ì¥¯¥È¥ê C<\$Config{privlib}> ¤òÊѹ¹¤·¤Æ +¿·¤·¤¤¥Ç¥£¥ì¥¯¥È¥ê¤ò¼¨¤¹¤«¡¢¤¢¤ë¤¤¤ÏÊѹ¹Á°¤Ë¤½¤Î¥Ç¥£¥ì¥¯¥È¥ê¤Î¥³¥Ô¡¼¤ò +ºîÀ®¤·¤Æ¡¢C<@INC> ¤Þ¤¿¤Ï C<-I> ¼Â¹Ô»þ¥é¥ó¥¿¥¤¥à¥Õ¥é¥°¤ò»ÈÍѤ·¤Æ¥Ð¡¼¥¸¥ç¥ó´Ö¤Î +ÀÚ¤êÂؤ¨¤ò¹Ô¤¦¤³¤È¤¬¤Ç¤¤Þ¤¹(¤·¤«¤·¥¥ã¥Ã¥·¥å¤Î¤¿¤á¤Ë¡¢¥×¥í¥»¥¹¤ÎÅÓÃæ¤Ç¤Ï +¤Ç¤¤Þ¤»¤ó); ¤¿¤À¤·¡¢¤³¤Î¤¹¤Ù¤Æ¤Ï¤³¤ì¤é¤ÎÀâÌÀ¤Î¥¹¥³¡¼¥×³°¤Ç¤¹¡£ =head1 BUGS @@ -4021,7 +3988,7 @@ Translate: KIMURA Koichi (-5.8.5) Update: SHIRAKATA Kentaro <argra****@ub32*****> (5.10.0-) -Status: in progress +Status: completed =end meta