• R/O
  • HTTP
  • SSH
  • HTTPS

コミット

よく使われているワード(クリックで追加)

javac++androidlinuxc#windowsobjective-ccocoa誰得qtpythonphprubygameguibathyscaphec計画中(planning stage)翻訳omegatframeworktwitterdomtestvb.netdirectxゲームエンジンbtronarduinopreviewer

This is a slow ctype library for sjis characters. 高速ではない シフトジス用の ctype 文字類識別ライブラリです。


コミットメタ情報

リビジョン0466ca32c85e3c997c5e3f0cf49556c5507593fd (tree)
日時2014-01-03 14:43:32
作者Joel Matthew Rees <reiisi@user...>
コミッターJoel Matthew Rees

ログメッセージ

More at the training center -- starting a hexdump

変更サマリ

差分

--- /dev/null
+++ b/sjhexdump.c
@@ -0,0 +1,53 @@
1+/* 訓練所の自習
2+// hexdump を作ってみましょう。
3+//
4+// by Joel Rees
5+*/
6+
7+
8+#include <stdio.h>
9+#include <stdlib.h>
10+#include <string.h>
11+#include <ctype.h>
12+
13+
14+#define DEFAULTWIDTH 16
15+
16+#define READSIZE 128
17+#define BUFFSIZE ( READSIZE + 4 )
18+
19+char buffer[ BUFFSIZE ];
20+
21+
22+int main( void )
23+{
24+ FILE * in = stdin;
25+ unsigned long address = 0; /* 4ギガより小さいファイル。 */
26+ int column = 0;
27+ int columnLimit = DEFAULTWIDTH;
28+ int ch;
29+
30+ while ( ( ch = fgetc( in ) ) != EOF )
31+ {
32+ if ( column == 0 )
33+ {
34+ printf( "0x%08x: ", address );
35+ }
36+ buffer[ column ] = isprint( ch) ? (char) ch : '.';
37+ printf( "%02x ", ch );
38+ ++address;
39+ ++column;
40+ if ( column >= columnLimit )
41+ {
42+ buffer[ column ] = '\0';
43+ printf( " %s\n", buffer );
44+ column = 0;
45+ }
46+ }
47+ if ( column > 0 )
48+ fputc( '\n', stdout );
49+
50+ return EXIT_SUCCESS;
51+}
52+
53+
--- a/slowsjctype.c
+++ b/slowsjctype.c
@@ -1 +1 @@
1-/* slowsjctype.c v00.00.01.jmr // Near-ctype functions for shift-JIS characters, slow version. // Written by Joel Matthew Rees, Amagasaki, Hyogo, Japan, beginning April 2001. // joel_rees@sannet.ne.jp // // Shifting strategy for usability in current C environments: // pass char pointers instead of unsigned char pointers. // Also, adding P to names to emphasize pointer usage. // // Copyright 2000, 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ /* Primary references for the ranges chosen below: // // Character palette from Apple's Kotoeri input method, systems 7/8/9. // Publisher: Apple, included with Apple's Macintosh operating systems. // The character palettes since sys. 8.0 or 8.1 have included primary pronunciations, // as well as JIS, kuten, and UNICODE assignments, in a detailed view. // Since at least sys. 8.5 or 8.6, a flag appears when a non-standard character is selected. // Newer versions track the changes to the various standards. // // Pasokon/Waapuro Kanji Jiten, 1987 Edition // Compiler: Tsutomu Uegaki; Publisher: Natsume-sha (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Contains a nice rectanglular arrangement of Kanji on pages 588-599. // // Waapuro/Pasokon Saishin Kanji Jiten, 1st Edition (1994) // Compiler: Shougakukan Dictionary Editors Department; // Publisher: Shougakukan (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Includes a list of the proposed annex characters, with annex numbers. // The annex characters have been assigned actual codes since this edition was published. // // Pasokon Yougo Jiten, 1992-93 Edition // Authors: Shigeru Okamoto, Ichirou Senba, Yoshiaki Nakamura, Kazuko Takahashi; // Publisher: Gijutsu Hyouron-sha (Shinjuku-ku). // Dictionary of personal computer terminology, // particularly referenced the JIS/ISO/ANSI 8-bit character tables starting page 409. */ #include "sjctypenv.h" #include "sj8bitChars.h" #include "sj16bitChars.h" #include "slowsjctype.h" /* Because char is probably signed, // it is usually liable to induce errors to use escaped char constant notation. // '\x80' may well be something like 0xffffff80, rather than 0x80. // Hopefully, I have been consistent about this. <erg/> // Note the problems when comparing a char variable with a character constant: // char scan; . . . while ( scan <= 0x9f ) // will produce an infinite loop, which is probably not the desired effect. // 0x9f is an integer equal to decimal 159. // '\x9f' is a char and promotes to integer with sign extension: // ( -( 256 - 159 ) ) == ( -97 ) // Two's complement. // . . . while ( scan <= 'x9f' ) // will probably produce the desired result, but by an un-expected calculation. // For instance, // scan = 0x9e; if ( scan < '\x9f' ) // yields true because -98 is less than -97, not because 158 is less than 159. // I tend to forget which is which in the middle of loops, // so I usually use long integers in loops (which is a good idea anyway) // and avoid comparing to integer constants. // This is also a reason I use symbolic constants instead of directly using characters. // // This shows one of the many reasons for having some means of dialect control, // instead of constraining the one-and-only standard in ways that turn out to be non-optimal. */ /* Cleared the unwanted dependency on sjctypenv.h (bool) -- JMR2001.05.31 // This required changing the bool typed functions to int typed functions, as noted below. // This mod by Joel Matthew Rees, released under original terms of use. */ int slowsjIsPOneByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int b = * ( (ubyte *) chp ); return b < 0x80 || ( b >= 0xa1 && b <= 0xdf ); } int slowsjIsPHighByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bHi = ( (ubyte *) chp )[ 0 ]; return ( bHi >= 0x81 && bHi <= 0x9f ) || ( bHi >= 0xe0 && bHi <= 0xfc ); } int slowsjIsPLowByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo >= 0x40 && bLo <= 0xfc && bLo != 0x7f; } int slowsjIsP7bit( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo < 0x80; } int slowsjPGuessCount( char * chp ) { return ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) ) ? 2 : slowsjIsPOneByte( chp ) ? 1 : 0; } int slowsjIsPCntrl( char * chp ) { int uch = (ubyte) chp[ 0 ]; return ( uch <= 0x1f || uch == 0x7f ) ? 1 : 0; /* DEL added JMR2001.05.23 */ /* The standard doesn't know for unit separator. */ } int slowsjIsPSpace( char * chp ) { ubyte * uchp = (ubyte *) chp; switch ( * uchp ) { case b7_HT: case b7_LF: case b7_VT: case b7_FF: case b7_CR: case b7_SP: return 1; default: return ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) ? 2 : 0; /* 0x8140 is sjis 2-byte space */ } } int slowsjIsPDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ZERO[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_ZERO[ 1 ] && b <= b16_NINE[ 1 ] ) ? 2 : 0; } else { return ( b >= b7_ZERO && b <= b7_NINE ) ? 1 : 0; } } int slowsjIsPXDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( ( b >= b16_A[ 1 ] && b <= b16_F[ 1 ] ) || ( b >= b16_a[ 1 ] && b <= b16_f[ 1 ] ) ) ? 2 : slowsjIsPDigit( chp ); } else { return ( ( b >= b7_A && b <= b7_F ) || ( b >= b7_a && b <= b7_f ) ) ? 1 : slowsjIsPDigit( chp ); } } int slowsjIsPRomanLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_a[ 1 ] && b <= b16_z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_a && b <= b7_z ) ? 1 : 0; } } int slowsjIsPRomanUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_A[ 1 ] && b <= b16_Z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_A && b <= b7_Z ) ? 1 : 0; } } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRoman( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPRomanUpper( chp ); return result; } int slowsjIsPGreekLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_alpha[ 0 ] ) && ( b >= b16_alpha[ 1 ] && b <= b16_omega[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPGreekUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_ALPHA[ 0 ] ) && ( b >= b16_ALPHA[ 1 ] && b <= b16_OMEGA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPGreek( char * chp ) { int result = slowsjIsPGreekLower( chp ); if ( result == 0 ) slowsjIsPGreekUpper( chp ); return result; } int slowsjIsPRussianLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_a[ 0 ] ) && ( b >= b16_Russian_a[ 1 ] && b <= b16_Russian_ya[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPRussianUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_A[ 0 ] ) && ( b >= b16_Russian_A[ 1 ] && b <= b16_Russian_YA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRussian( char * chp ) { int result = slowsjIsPRussianLower( chp ); if ( result == 0 ) slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPUpper( char * chp ) { int result = slowsjIsPRomanUpper( chp ); if ( result == 0 ) result = slowsjIsPGreekUpper( chp ); if ( result == 0 ) result = slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPLower( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPGreekLower( chp ); if ( result == 0 ) result = slowsjIsPRussianLower( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPEurAsianAlpha( char * chp ) { int result = slowsjIsPRoman( chp ); if ( result == 0 ) result = slowsjIsPGreek( chp ); if ( result == 0 ) result = slowsjIsPRussian( chp ); return result; } int slowsjIsPQuasiEurAsianAlpha( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_AccentAcute_Prime[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_AccentAcute_Prime[ 1 ] || b == b16_AccentGrave[ 1 ] || b == b16_Umlaut[ 1 ] || b == b16_AccentCircumflex[ 1 ] || b == b16_Overline_Negate[ 1 ] || b == b16_QuarterDash_Hyphen[ 1 ] || b == b16_WavyDash_Tilde[ 1 ] ) ? 2 : 0; } else { return ( b == b7_HYPHEN || b == b7_ACCENTGRAVE || b == b7_TILDE || b == b7_CARET ) ? 1 : 0; } } int slowsjIsPHiragana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_hiraganaSub_a[ 0 ] ) && ( b >= b16_hiraganaSub_a[ 1 ] && b <= b16_hiragana_ng[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPKatakana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_katakanaSub_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_katakanaSub_a[ 1 ] && b <= b16_katakanaSub_ke[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( ( b >= b8_katakana_wo && b <= b8_katakanaSub_tu ) || ( b >= b8_katakana_a && b <= b8_katakana_ng ) ) ? 1 : 0; } } /* Time biased against katakana, but we don't care on the slow version. */ int slowsjIsPKana( char * chp ) { int result = slowsjIsPHiragana( chp ); if ( result == 0 ) result = slowsjIsPKatakana( chp ); return result; } int slowsjIsPQuasiKana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_DakuTen[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_DakuTen[ 1 ] || b == b16_HanDakuTen[ 1 ] || b == b16_KatakanaRepeat[ 1 ] || b == b16_KatakanaRepeatVoiced[ 1 ] || b == b16_HiraganaRepeat[ 1 ] || b == b16_HiraganaRepeatVoiced[ 1 ] || b == b16_ChoOn[ 1 ] ) ? 2 : 0; } else { return ( b == b8_ChoOn || b == b8_DakuTen || b == b8_HandakuTen ) ? 1 : 0; } } /* This has even time-bias for JIS level 1. */ int slowsjIsPKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) && ( ( bHi == b16_kanji1Low_a[ 0 ] && bLo >= b16_kanji1Low_a[ 1 ] ) || ( bHi > b16_kanji1Low_a[ 0 ] && bHi < b16_kanji1High_ude[ 0 ] ) || ( bHi == b16_kanji1High_ude[ 0 ] && bLo <= b16_kanji1High_ude[ 1 ] ) || ( bHi == b16_kanji2aLow_ichi[ 0 ] && bLo >= b16_kanji2aLow_ichi[ 1 ] ) || ( bHi > b16_kanji2aLow_ichi[ 0 ] && bHi <= b16_kanji2aHigh_jou[ 0 ] ) /* The rows at the end of 2a and beginning of 2b are complete. */ || ( bHi >= b16_kanji2bLow_you[ 0 ] && bHi <= b16_kanji2bHigh_hikaru[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru[ 0 ] && bLo <= b16_kanji2bHigh_hikaru[ 1 ] ) ) ) return 2; else return 0; } /* This is completely time-biased against kanji, and a little harder to mentally verify. { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( !slowsjIsPHighByte( chp ) || !slowsjIsPLowByte( chp + 1 ) || bHi < b16_kanji1Low_a_sub[ 0 ] || ( bHi == b16_kanji1Low_a_sub[ 0 ] && bLo < b16_kanji1Low_a_sub[ 1 ] ) || ( bHi == b16_kanji1High_ude_arm[ 0 ] && bLo > b16_kanji1High_ude_arm[ 1 ] && bLo < b16_kanji2aLow_ichi_formalOne[ 1 ] ) || ( bHi > b16_kanji2aHigh_ude_arm[ 0 ] && bHi < b16_kanji2bLow_yo_e040[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru_eaa4[ 0 ] && bLo > b16_kanji2bHigh_hikaru_eaa4[ 1 ] ) || bHi > b16_kanji2bHigh_hikaru_eaa4[ 0 ] ) return 0; else return 2; } */ int slowsjIsPQuasiKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_KanjiIbid[ 0 ] ) && ( b >= b16_KanjiIbid[ 1 ] /* This might be a proper Kanji? */ || b <= b16_Ditto[ 1 ] /* Should this be only with European mods? */ || b <= b16_Shime[ 1 ] /* Probably not Kanji? */ || b <= b16_KanjiZero[ 1 ] /* Should this be Kanji? */ || b <= b16_OpenCircle_Maru[ 1 ] /* Often used as fill-in-th-blank. */ || b <= b16_KanjiRepeat[ 1 ] ) )? 2 : 0; } /* Run-time bias against everybody. // Should give fairly even timing in general use // and give best timing for generating tables. */ int slowsjIsPAlpha( char * chp ) { int result = slowsjIsPKanji( chp ); if ( result == 0 ) result = slowsjIsPKana( chp ); if ( result == 0 ) result = slowsjIsPEurAsianAlpha( chp ); return result; } /* Use the same bias as alpha, just to be obnoxious. */ int slowsjIsPQuasiAlpha( char * chp ) { int result = slowsjIsPQuasiKanji( chp ); if ( result == 0 ) result = slowsjIsPQuasiKana( chp ); if ( result == 0 ) result = slowsjIsPQuasiEurAsianAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNum( char * chp ) { int result = slowsjIsPDigit( chp ); if ( result == 0 ) result = slowsjIsPAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNumQuasi( char * chp ) { int result = slowsjIsPQuasiAlpha( chp ); if ( result == 0 ) result = slowsjIsPAlNum( chp ); return result; } int slowsjIsPLineDraw( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_LineDraw_1H[ 0 ] ) && ( b >= b16_LineDraw_1H[ 1 ] && b <= b16_LineDraw_1H2V[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPPunct( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ToTen[ 0 ] ) /* Nice of the JIS comittee to put them all together. */ { b = uchp[ 1 ]; return ( b != 0x7f /* Check and excuse later */ && ( ( b >= b16_ToTen[ 1 ] && b <= b16_Geta[ 1 ] ) || ( b >= b16_Element[ 1 ] && b <= b16_Intersection[ 1 ] ) || ( b >= b16_Conjunction_And[ 1 ] && b <= b16_Exists[ 1 ] ) || ( b >= b16_Angle[ 1 ] && b <= b16_DoubleIntegral[ 1 ] ) || ( b >= b16_Angstrom[ 1 ] && b <= b16_Paragraph[ 1 ] ) || ( b == b16_CompositionCircle[ 1 ] ) ) ) ? 2 : 0; } else { return ( ( b >= b7_EXCLAIM && b <= b7_SLASH ) || ( b >= b7_COLON && b <= b7_ATEACH ) || ( b >= b7_LEFTBRACKET && b <= b7_ACCENTGRAVE ) || ( b >= b7_LEFTBRACE && b <= b7_TILDE ) || ( b >= b8_Kuten && b <= b8_ChuTen ) || ( b == b8_ChoOn ) || ( b >= b8_DakuTen && b <= b8_HandakuTen ) ) ? 1 : 0; } } int slowsjIsPGraph( char * chp ) { int result = slowsjIsPAlNum( chp ); if ( result == 0 ) result = slowsjIsPPunct( chp ); return result; } int slowsjIsPPrint( char * chp ) { ubyte * uchp = (ubyte *) chp; if ( * uchp == b7_SP ) return 1; else if ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) return 2; else return slowsjIsPGraph( chp ); } /* Macro to isprint() works just fine because there are no two-byte control characters. int slowsjIsP2Byte( char * chp ) {} */ /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. // Some are entirely above and some entirely below. // JIS Roman/Greek/Russian doesn't include any caseless characters in my materials. // But if they did I could test the converted character for validity before returning it. // Just for fun, I'll include the test anyway. */ int slowsjPToLowerRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanUpper( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = (ubyte) ( uchpin[ 0 ] + ( b7_a - b7_A ) ); break; case 2: temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_a[ 1 ] - b16_A[ 1 ] ) ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanLower( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanLower( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = (ubyte) ( uchpin[ 0 ] - ( b7_a - b7_A ) ); break; case 2: temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_a[ 1 ] - b16_A[ 1 ] ) ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanUpper( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekUpper( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ) ); /* No gap */ } if ( count == 2 && slowsjIsPGreekLower( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekLower( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ) ); /* No gap */ } if ( count == 2 && slowsjIsPGreekUpper( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianUpper( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ) ); if ( temp[ 1 ] >= 0x7f ) /* Adjust for the gap. */ ++temp[ 1 ]; /* Borland didn't like += 1. */ } if ( count == 2 && slowsjIsPRussianLower( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianLower( chpin ); /* Checks the gap. */ ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ) ); if ( uchpin[ 1 ] > 0x7f ) /* Adjust for the gap (0x7f already filtered above). */ --temp[ 1 ]; /* Borland didn't like -= 1. */ } if ( count == 2 && slowsjIsPRussianUpper( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } /* Again, time-biased in favor of the most likely. (Russian and Greek are not as commonly used.) // Would be faster to test directly, but that increases logical coupling // (increases the chance for algorithmic errors). // Reducing errors is a higher priority than speed. */ int slowsjPToLower( char * chpin, char * chpout ) { int count = slowsjPToLowerRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerRussian( chpin, chpout ); return count; } int slowsjPToUpper( char * chpin, char * chpout ) { int count = slowsjPToUpperRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperRussian( chpin, chpout ); return count; } /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. Some are entirely above and some entirely below. JIS Roman/Greek/Russian doesn't include caseless. For converting katakana to hiragana, I can test whether the result is valid before returning it. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. */ /* So, the initial, standard function headers: int slowsjIsCntrl( unsigned char * mbc ) As near as I can tell, all one byte, between 0 and 0x1f, inclusive. Returns byte count. int slowsjIsSpace( unsigned char * mbc ) Adds one two byte version of the space character. Returns byte count. int slowsjIsPrint( unsigned char * mbc ) All graphic characters, including non-control space characters. Returns byte count. int slowsjIsGraph( unsigned char * mbc ) All graphic non-space characters. Returns byte count. int slowsjIsPunct( unsigned char * mbc ) All non-word-forming characters. Will later be subdivided for the richer JIS set. Returns byte count. int slowsjIsDigit( unsigned char * mbc ) The standard digits 0..9, as specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsXDigit( unsigned char * mbc ) The standard hexadecimal digits specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsAlpha( unsigned char * mbc ) Characters used to form words, as used by non-programmers. Does not include the standard decimal digits, but does include the kanji numbers. Includes a lot of caseless characters, of course. Returns byte count. int slowsjIsAlNum( unsigned char * mbc ) Characters used to form words, as used by programmers, thus including digits. Returns byte count. int slowsjIsUpper( unsigned char * mbc ) Upper cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjIsLower( unsigned char * mbc ) Lower cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjToLower( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to lower case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjIs1Byte( unsigned char * mbc ) Valid one byte character. Returns byte count. int slowsjIs2Byte( unsigned char * mbc ) Valid two byte character? Returns byte count. int slowsjCouldBe2Byte( unsigned char * mbc ) A combination of valid lead byte and valid tail byte? Returns byte count. The second, or fast version slowsjIsXX() functions will use constants of the pattern slowsjIsXX_k. The constants and the general call will also be provided in the source header, as mentioned above, for optimization: int slowsjCType( unsigned long type, unsigned char * mbc ) Test the type formed by the bit-or of the type constants passed as the first parameter. Returns byte count on test true or zero on test false. The initial slow version functions will have names of the pattern slow_slowsjIsXX() so they can co-exist during debugging. slowsjrIsXX()? Now, some of the foreseeable necessary extensions: int slowsjIsMath( unsigned char * mbc ) The plethora of math and logic symbols in JIS. Returns byte count. int slowsjIsUnit( unsigned char * mbc ) The plethora of unit symbols in JIS, but not system specific extensions like m2. Does not include kanji. Returns byte count. int slowsjIsQuote( unsigned char * mbc ) The plethora of quoting and parenthetic characters in JIS. Returns byte count. int slowsjIsKanji( unsigned char * mbc ) All the proper kanji characters. Returns byte count. int isNumberKanji( unsigned char * mbc ) All the number kanji, including the special ones used, for example, on currency and bank notes. Returns byte count. int slowsjIsKana( unsigned char * mbc ) All the katakana and hiragana characters, including the one byte katakana. Also including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsKata( unsigned char * mbc ) All the katakana, including the SJIS one byte katakana, but not the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsHira( unsigned char * mbc ) All the hiragana, not including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjToKata( unsigned char * mbcin, unsigned char * mbcout ) Converts hiragana to katakana. Returns byte count converted or zero. int slowsjToHira( unsigned char * mbcin, unsigned char * mbcout ) Converts katakana to hiragana, where possible. Moves the unconvertable katakana as they are. Does not convert the one byte katakana. Returns byte count converted or zero. int slowsjTo16Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts the one byte katakana to two byte katakana. Round trip slowsjTo16Kata() -> slowsjTo8Kata() should be guaranteeable. Returns byte count converted or zero. int slowsjTo8Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts two byte katakana to one byte katakana, where possible. Round trip slowsjTo8Kata() -> slowsjTo16Kata() may be guaranteeable, I'm not sure yet. Returns byte count converted or zero. Some of the hypothetical extensions: int slowsjIsMusic( unsigned char * mbc ) The music symbols in JIS. Returns byte count. int slowsjIsKanjiUnit( unsigned char * mbc ) The kanji version of units, including also ten, hundred, thousand, ten-thousand, etc. Returns byte count. int slowsjIsRoman( unsigned char * mbc ) All the JIS Roman (two byte Latin) characters. Returns byte count. int slowsjIsGreek( unsigned char * mbc ) All the JIS Greek characters. Returns byte count. int slowsjIsRussian( unsigned char * mbc ) All the JIS Russian characters. Returns byte count. int slowsjIsLatin( unsigned char * mbc ) All the Latin characters, including the two byte Roman (Latin) and one byte Latin. Returns byte count. int slowsjToRoman( unsigned char * mbcin, unsigned char * mbcout ) Convert one byte Latin to two byte JIS Roman (Latin). Returns byte count converted or zero. int slowsjToLatin( unsigned char * mbcin, unsigned char * mbcout ) Convert two byte JIS Roman (Latin) to one byte Latin. Returns byte count converted or zero. */
\ No newline at end of file
1+/* slowsjctype.c v00.00.01.jmr // Near-ctype functions for shift-JIS characters, slow version. // Written by Joel Matthew Rees, Amagasaki, Hyogo, Japan, beginning April 2001. // joel_rees@sannet.ne.jp // // Shifting strategy for usability in current C environments: // pass char pointers instead of unsigned char pointers. // Also, adding P to names to emphasize pointer usage. // // Copyright 2000, 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ /* Primary references for the ranges chosen below: // // Character palette from Apple's Kotoeri input method, systems 7/8/9. // Publisher: Apple, included with Apple's Macintosh operating systems. // The character palettes since sys. 8.0 or 8.1 have included primary pronunciations, // as well as JIS, kuten, and UNICODE assignments, in a detailed view. // Since at least sys. 8.5 or 8.6, a flag appears when a non-standard character is selected. // Newer versions track the changes to the various standards. // // Pasokon/Waapuro Kanji Jiten, 1987 Edition // Compiler: Tsutomu Uegaki; Publisher: Natsume-sha (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Contains a nice rectanglular arrangement of Kanji on pages 588-599. // // Waapuro/Pasokon Saishin Kanji Jiten, 1st Edition (1994) // Compiler: Shougakukan Dictionary Editors Department; // Publisher: Shougakukan (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Includes a list of the proposed annex characters, with annex numbers. // The annex characters have been assigned actual codes since this edition was published. // // Pasokon Yougo Jiten, 1992-93 Edition // Authors: Shigeru Okamoto, Ichirou Senba, Yoshiaki Nakamura, Kazuko Takahashi; // Publisher: Gijutsu Hyouron-sha (Shinjuku-ku). // Dictionary of personal computer terminology, // particularly referenced the JIS/ISO/ANSI 8-bit character tables starting page 409. */ #include "sjctypenv.h" #include "sj8bitChars.h" #include "sj16bitChars.h" #include "slowsjctype.h" /* Because char is probably signed, // it is usually liable to induce errors to use escaped char constant notation. // '\x80' may well be something like 0xffffff80, rather than 0x80. // Hopefully, I have been consistent about this. <erg/> // Note the problems when comparing a char variable with a character constant: // char scan; . . . while ( scan <= 0x9f ) // will produce an infinite loop, which is probably not the desired effect. // 0x9f is an integer equal to decimal 159. // '\x9f' is a char and promotes to integer with sign extension: // ( -( 256 - 159 ) ) == ( -97 ) // Two's complement. // . . . while ( scan <= 'x9f' ) // will probably produce the desired result, but by an un-expected calculation. // For instance, // scan = 0x9e; if ( scan < '\x9f' ) // yields true because -98 is less than -97, not because 158 is less than 159. // I tend to forget which is which in the middle of loops, // so I usually use long integers in loops (which is a good idea anyway) // and avoid comparing to integer constants. // This is also a reason I use symbolic constants instead of directly using characters. // // This shows one of the many reasons for having some means of dialect control, // instead of constraining the one-and-only standard in ways that turn out to be non-optimal. */ /* Cleared the unwanted dependency on sjctypenv.h (bool) -- JMR2001.05.31 // This required changing the bool typed functions to int typed functions, as noted below. // This mod by Joel Matthew Rees, released under original terms of use. */ int slowsjIsPOneByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int b = * ( (ubyte *) chp ); return b < 0x80 || ( b >= 0xa1 && b <= 0xdf ); } int slowsjIsPHighByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bHi = ( (ubyte *) chp )[ 0 ]; return ( bHi >= 0x81 && bHi <= 0x9f ) || ( bHi >= 0xe0 && bHi <= 0xfc ); } int slowsjIsPLowByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo >= 0x40 && bLo <= 0xfc && bLo != 0x7f; } int slowsjIsP7bit( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo < 0x80; } int slowsjPGuessCount( char * chp ) { return ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) ) ? 2 : slowsjIsPOneByte( chp ) ? 1 : 0; } int slowsjIsPCntrl( char * chp ) { int uch = (ubyte) chp[ 0 ]; return ( uch <= 0x1f || uch == 0x7f ) ? 1 : 0; /* DEL added JMR2001.05.23 */ /* The standard doesn't know for unit separator. */ } int slowsjIsPSpace( char * chp ) { ubyte * uchp = (ubyte *) chp; switch ( * uchp ) { case b7_HT: case b7_LF: case b7_VT: case b7_FF: case b7_CR: case b7_SP: return 1; default: return ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) ? 2 : 0; /* 0x8140 is sjis 2-byte space */ } } int slowsjIsPDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ZERO[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_ZERO[ 1 ] && b <= b16_NINE[ 1 ] ) ? 2 : 0; } else { return ( b >= b7_ZERO && b <= b7_NINE ) ? 1 : 0; } } int slowsjIsPXDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( ( b >= b16_A[ 1 ] && b <= b16_F[ 1 ] ) || ( b >= b16_a[ 1 ] && b <= b16_f[ 1 ] ) ) ? 2 : slowsjIsPDigit( chp ); } else { return ( ( b >= b7_A && b <= b7_F ) || ( b >= b7_a && b <= b7_f ) ) ? 1 : slowsjIsPDigit( chp ); } } int slowsjIsPRomanLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_a[ 1 ] && b <= b16_z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_a && b <= b7_z ) ? 1 : 0; } } int slowsjIsPRomanUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_A[ 1 ] && b <= b16_Z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_A && b <= b7_Z ) ? 1 : 0; } } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRoman( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPRomanUpper( chp ); return result; } int slowsjIsPGreekLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_alpha[ 0 ] ) && ( b >= b16_alpha[ 1 ] && b <= b16_omega[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPGreekUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_ALPHA[ 0 ] ) && ( b >= b16_ALPHA[ 1 ] && b <= b16_OMEGA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPGreek( char * chp ) { int result = slowsjIsPGreekLower( chp ); if ( result == 0 ) slowsjIsPGreekUpper( chp ); return result; } int slowsjIsPRussianLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_a[ 0 ] ) && ( b >= b16_Russian_a[ 1 ] && b <= b16_Russian_ya[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPRussianUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_A[ 0 ] ) && ( b >= b16_Russian_A[ 1 ] && b <= b16_Russian_YA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRussian( char * chp ) { int result = slowsjIsPRussianLower( chp ); if ( result == 0 ) slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPUpper( char * chp ) { int result = slowsjIsPRomanUpper( chp ); if ( result == 0 ) result = slowsjIsPGreekUpper( chp ); if ( result == 0 ) result = slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPLower( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPGreekLower( chp ); if ( result == 0 ) result = slowsjIsPRussianLower( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPEurAsianAlpha( char * chp ) { int result = slowsjIsPRoman( chp ); if ( result == 0 ) result = slowsjIsPGreek( chp ); if ( result == 0 ) result = slowsjIsPRussian( chp ); return result; } int slowsjIsPQuasiEurAsianAlpha( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_AccentAcute_Prime[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_AccentAcute_Prime[ 1 ] || b == b16_AccentGrave[ 1 ] || b == b16_Umlaut[ 1 ] || b == b16_AccentCircumflex[ 1 ] || b == b16_Overline_Negate[ 1 ] || b == b16_QuarterDash_Hyphen[ 1 ] || b == b16_WavyDash_Tilde[ 1 ] ) ? 2 : 0; } else { return ( b == b7_HYPHEN || b == b7_ACCENTGRAVE || b == b7_TILDE || b == b7_CARET ) ? 1 : 0; } } int slowsjIsPHiragana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_hiraganaSub_a[ 0 ] ) && ( b >= b16_hiraganaSub_a[ 1 ] && b <= b16_hiragana_ng[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPKatakana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_katakanaSub_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_katakanaSub_a[ 1 ] && b <= b16_katakanaSub_ke[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( ( b >= b8_katakana_wo && b <= b8_katakanaSub_tu ) || ( b >= b8_katakana_a && b <= b8_katakana_ng ) ) ? 1 : 0; } } /* Time biased against katakana, but we don't care on the slow version. */ int slowsjIsPKana( char * chp ) { int result = slowsjIsPHiragana( chp ); if ( result == 0 ) result = slowsjIsPKatakana( chp ); return result; } int slowsjIsPQuasiKana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_DakuTen[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_DakuTen[ 1 ] || b == b16_HanDakuTen[ 1 ] || b == b16_KatakanaRepeat[ 1 ] || b == b16_KatakanaRepeatVoiced[ 1 ] || b == b16_HiraganaRepeat[ 1 ] || b == b16_HiraganaRepeatVoiced[ 1 ] || b == b16_ChoOn[ 1 ] ) ? 2 : 0; } else { return ( b == b8_ChoOn || b == b8_DakuTen || b == b8_HandakuTen ) ? 1 : 0; } } /* This has even time-bias for JIS level 1. */ int slowsjIsPKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) && ( ( bHi == b16_kanji1Low_a[ 0 ] && bLo >= b16_kanji1Low_a[ 1 ] ) || ( bHi > b16_kanji1Low_a[ 0 ] && bHi < b16_kanji1High_ude[ 0 ] ) || ( bHi == b16_kanji1High_ude[ 0 ] && bLo <= b16_kanji1High_ude[ 1 ] ) || ( bHi == b16_kanji2aLow_ichi[ 0 ] && bLo >= b16_kanji2aLow_ichi[ 1 ] ) || ( bHi > b16_kanji2aLow_ichi[ 0 ] && bHi <= b16_kanji2aHigh_jou[ 0 ] ) /* The rows at the end of 2a and beginning of 2b are complete. */ || ( bHi >= b16_kanji2bLow_you[ 0 ] && bHi <= b16_kanji2bHigh_hikaru[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru[ 0 ] && bLo <= b16_kanji2bHigh_hikaru[ 1 ] ) ) ) return 2; else return 0; } /* This is completely time-biased against kanji, and a little harder to mentally verify. { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( !slowsjIsPHighByte( chp ) || !slowsjIsPLowByte( chp + 1 ) || bHi < b16_kanji1Low_a_sub[ 0 ] || ( bHi == b16_kanji1Low_a_sub[ 0 ] && bLo < b16_kanji1Low_a_sub[ 1 ] ) || ( bHi == b16_kanji1High_ude_arm[ 0 ] && bLo > b16_kanji1High_ude_arm[ 1 ] && bLo < b16_kanji2aLow_ichi_formalOne[ 1 ] ) || ( bHi > b16_kanji2aHigh_ude_arm[ 0 ] && bHi < b16_kanji2bLow_yo_e040[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru_eaa4[ 0 ] && bLo > b16_kanji2bHigh_hikaru_eaa4[ 1 ] ) || bHi > b16_kanji2bHigh_hikaru_eaa4[ 0 ] ) return 0; else return 2; } */ int slowsjIsPQuasiKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_KanjiIbid[ 0 ] ) && ( b >= b16_KanjiIbid[ 1 ] /* This might be a proper Kanji? */ || b <= b16_Ditto[ 1 ] /* Should this be only with European mods? */ || b <= b16_Shime[ 1 ] /* Probably not Kanji? */ || b <= b16_KanjiZero[ 1 ] /* Should this be Kanji? */ || b <= b16_OpenCircle_Maru[ 1 ] /* Often used as fill-in-th-blank. */ || b <= b16_KanjiRepeat[ 1 ] ) )? 2 : 0; } /* Run-time bias against everybody. // Should give fairly even timing in general use // and give best timing for generating tables. */ int slowsjIsPAlpha( char * chp ) { int result = slowsjIsPKanji( chp ); if ( result == 0 ) result = slowsjIsPKana( chp ); if ( result == 0 ) result = slowsjIsPEurAsianAlpha( chp ); return result; } /* Use the same bias as alpha, just to be obnoxious. */ int slowsjIsPQuasiAlpha( char * chp ) { int result = slowsjIsPQuasiKanji( chp ); if ( result == 0 ) result = slowsjIsPQuasiKana( chp ); if ( result == 0 ) result = slowsjIsPQuasiEurAsianAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNum( char * chp ) { int result = slowsjIsPDigit( chp ); if ( result == 0 ) result = slowsjIsPAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNumQuasi( char * chp ) { int result = slowsjIsPQuasiAlpha( chp ); if ( result == 0 ) result = slowsjIsPAlNum( chp ); return result; } int slowsjIsPLineDraw( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_LineDraw_1H[ 0 ] ) && ( b >= b16_LineDraw_1H[ 1 ] && b <= b16_LineDraw_1H2V[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPPunct( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ToTen[ 0 ] ) /* Nice of the JIS comittee to put them all together. */ { b = uchp[ 1 ]; return ( b != 0x7f /* Check and excuse later */ && ( ( b >= b16_ToTen[ 1 ] && b <= b16_Geta[ 1 ] ) || ( b >= b16_Element[ 1 ] && b <= b16_Intersection[ 1 ] ) || ( b >= b16_Conjunction_And[ 1 ] && b <= b16_Exists[ 1 ] ) || ( b >= b16_Angle[ 1 ] && b <= b16_DoubleIntegral[ 1 ] ) || ( b >= b16_Angstrom[ 1 ] && b <= b16_Paragraph[ 1 ] ) || ( b == b16_CompositionCircle[ 1 ] ) ) ) ? 2 : 0; } else { return ( ( b >= b7_EXCLAIM && b <= b7_SLASH ) || ( b >= b7_COLON && b <= b7_ATEACH ) || ( b >= b7_LEFTBRACKET && b <= b7_ACCENTGRAVE ) || ( b >= b7_LEFTBRACE && b <= b7_TILDE ) || ( b >= b8_Kuten && b <= b8_ChuTen ) || ( b == b8_ChoOn ) || ( b >= b8_DakuTen && b <= b8_HandakuTen ) ) ? 1 : 0; } } int slowsjIsPGraph( char * chp ) { int result = slowsjIsPAlNum( chp ); if ( result == 0 ) result = slowsjIsPPunct( chp ); return result; } int slowsjIsPPrint( char * chp ) { ubyte * uchp = (ubyte *) chp; if ( * uchp == b7_SP ) return 1; else if ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) return 2; else return slowsjIsPGraph( chp ); } /* Macro to isprint() works just fine because there are no two-byte control characters. int slowsjIsP2Byte( char * chp ) {} */ /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. // Some are entirely above and some entirely below. // JIS Roman/Greek/Russian doesn't include any caseless characters in my materials. // But if they did I could test the converted character for validity before returning it. // Just for fun, I'll include the test anyway. */ int slowsjPToLowerRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanUpper( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = (ubyte) ( uchpin[ 0 ] + ( b7_a - b7_A ) ); break; case 2: temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_a[ 1 ] - b16_A[ 1 ] ) ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanLower( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanLower( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = (ubyte) ( uchpin[ 0 ] - ( b7_a - b7_A ) ); break; case 2: temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_a[ 1 ] - b16_A[ 1 ] ) ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanUpper( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekUpper( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ) ); /* No gap */ } if ( count == 2 && slowsjIsPGreekLower( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekLower( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ) ); /* No gap */ } if ( count == 2 && slowsjIsPGreekUpper( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianUpper( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ) ); if ( temp[ 1 ] >= 0x7f ) /* Adjust for the gap. */ ++temp[ 1 ]; /* Borland didn't like += 1. */ } if ( count == 2 && slowsjIsPRussianLower( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianLower( chpin ); /* Checks the gap. */ ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ) ); if ( uchpin[ 1 ] > 0x7f ) /* Adjust for the gap (0x7f already filtered above). */ --temp[ 1 ]; /* Borland didn't like -= 1. */ } if ( count == 2 && slowsjIsPRussianUpper( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } /* Again, time-biased in favor of the most likely. (Russian and Greek are not as commonly used.) // Would be faster to test directly, but that increases logical coupling // (increases the chance for algorithmic errors). // Reducing errors is a higher priority than speed. */ int slowsjPToLower( char * chpin, char * chpout ) { int count = slowsjPToLowerRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerRussian( chpin, chpout ); return count; } int slowsjPToUpper( char * chpin, char * chpout ) { int count = slowsjPToUpperRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperRussian( chpin, chpout ); return count; } int slowsjPToKatakana( char * chpin, char * chpout ) { int count = slowsjIsPHiragana( chpin ); ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] + ( b16_katakanaSub_a[ 1 ] - b16_hiraganaSub_a[ 1 ] ) ); if ( temp[ 1 ] >= 0x7f ) /* Adjust for the gap. */ ++temp[ 1 ]; /* Borland didn't like += 1. */ } if ( count == 2 && slowsjIsPKatakana( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToHiragana( char * chpin, char * chpout ) { int count = slowsjIsPKatakana( chpin ); /* Checks the gap. */ ubyte * uchpin = (ubyte *) chpin; ubyte * uchpout = (ubyte *) chpout; ubyte temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = uchpin[ 0 ]; temp[ 1 ] = (ubyte) ( uchpin[ 1 ] - ( b16_katakanaSub_a[ 1 ] - b16_hiraganaSub_a[ 1 ] ) ); if ( uchpin[ 1 ] > 0x7f ) /* Adjust for the gap (0x7f already filtered above). */ --temp[ 1 ]; /* Borland didn't like -= 1. */ } if ( count == 2 && slowsjIsPHiragana( (char *) temp ) == count ) { uchpout[ 0 ] = temp[ 0 ]; uchpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. Some are entirely above and some entirely below. JIS Roman/Greek/Russian doesn't include caseless. For converting katakana to hiragana, I can test whether the result is valid before returning it. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. */ /* So, the initial, standard function headers: int slowsjIsCntrl( unsigned char * mbc ) As near as I can tell, all one byte, between 0 and 0x1f, inclusive. Returns byte count. int slowsjIsSpace( unsigned char * mbc ) Adds one two byte version of the space character. Returns byte count. int slowsjIsPrint( unsigned char * mbc ) All graphic characters, including non-control space characters. Returns byte count. int slowsjIsGraph( unsigned char * mbc ) All graphic non-space characters. Returns byte count. int slowsjIsPunct( unsigned char * mbc ) All non-word-forming characters. Will later be subdivided for the richer JIS set. Returns byte count. int slowsjIsDigit( unsigned char * mbc ) The standard digits 0..9, as specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsXDigit( unsigned char * mbc ) The standard hexadecimal digits specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsAlpha( unsigned char * mbc ) Characters used to form words, as used by non-programmers. Does not include the standard decimal digits, but does include the kanji numbers. Includes a lot of caseless characters, of course. Returns byte count. int slowsjIsAlNum( unsigned char * mbc ) Characters used to form words, as used by programmers, thus including digits. Returns byte count. int slowsjIsUpper( unsigned char * mbc ) Upper cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjIsLower( unsigned char * mbc ) Lower cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjToLower( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to lower case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjIs1Byte( unsigned char * mbc ) Valid one byte character. Returns byte count. int slowsjIs2Byte( unsigned char * mbc ) Valid two byte character? Returns byte count. int slowsjCouldBe2Byte( unsigned char * mbc ) A combination of valid lead byte and valid tail byte? Returns byte count. The second, or fast version slowsjIsXX() functions will use constants of the pattern slowsjIsXX_k. The constants and the general call will also be provided in the source header, as mentioned above, for optimization: int slowsjCType( unsigned long type, unsigned char * mbc ) Test the type formed by the bit-or of the type constants passed as the first parameter. Returns byte count on test true or zero on test false. The initial slow version functions will have names of the pattern slow_slowsjIsXX() so they can co-exist during debugging. slowsjrIsXX()? Now, some of the foreseeable necessary extensions: int slowsjIsMath( unsigned char * mbc ) The plethora of math and logic symbols in JIS. Returns byte count. int slowsjIsUnit( unsigned char * mbc ) The plethora of unit symbols in JIS, but not system specific extensions like m2. Does not include kanji. Returns byte count. int slowsjIsQuote( unsigned char * mbc ) The plethora of quoting and parenthetic characters in JIS. Returns byte count. int slowsjIsKanji( unsigned char * mbc ) All the proper kanji characters. Returns byte count. int isNumberKanji( unsigned char * mbc ) All the number kanji, including the special ones used, for example, on currency and bank notes. Returns byte count. int slowsjIsKana( unsigned char * mbc ) All the katakana and hiragana characters, including the one byte katakana. Also including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsKata( unsigned char * mbc ) All the katakana, including the SJIS one byte katakana, but not the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsHira( unsigned char * mbc ) All the hiragana, not including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjToKata( unsigned char * mbcin, unsigned char * mbcout ) Converts hiragana to katakana. Returns byte count converted or zero. int slowsjToHira( unsigned char * mbcin, unsigned char * mbcout ) Converts katakana to hiragana, where possible. Moves the unconvertable katakana as they are. Does not convert the one byte katakana. Returns byte count converted or zero. int slowsjTo16Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts the one byte katakana to two byte katakana. Round trip slowsjTo16Kata() -> slowsjTo8Kata() should be guaranteeable. Returns byte count converted or zero. int slowsjTo8Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts two byte katakana to one byte katakana, where possible. Round trip slowsjTo8Kata() -> slowsjTo16Kata() may be guaranteeable, I'm not sure yet. Returns byte count converted or zero. Some of the hypothetical extensions: int slowsjIsMusic( unsigned char * mbc ) The music symbols in JIS. Returns byte count. int slowsjIsKanjiUnit( unsigned char * mbc ) The kanji version of units, including also ten, hundred, thousand, ten-thousand, etc. Returns byte count. int slowsjIsRoman( unsigned char * mbc ) All the JIS Roman (two byte Latin) characters. Returns byte count. int slowsjIsGreek( unsigned char * mbc ) All the JIS Greek characters. Returns byte count. int slowsjIsRussian( unsigned char * mbc ) All the JIS Russian characters. Returns byte count. int slowsjIsLatin( unsigned char * mbc ) All the Latin characters, including the two byte Roman (Latin) and one byte Latin. Returns byte count. int slowsjToRoman( unsigned char * mbcin, unsigned char * mbcout ) Convert one byte Latin to two byte JIS Roman (Latin). Returns byte count converted or zero. int slowsjToLatin( unsigned char * mbcin, unsigned char * mbcout ) Convert two byte JIS Roman (Latin) to one byte Latin. Returns byte count converted or zero. */
\ No newline at end of file
--- a/slowsjctype.h
+++ b/slowsjctype.h
@@ -1 +1 @@
1-/* slowsjctype.h v00.00.00.jmr // Near-ctype functions for shift-JIS characters, slow version. // Written by Joel Matthew Rees, Amagasaki, Hyogo, Japan, April 2001. // joel_rees@sannet.ne.jp // Shifting strategy for usability in current C environments: // pass char pointers instead of unsigned char. // Also, adding P to names to emphasize pointer usage. // // Copyright 2000, 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ #ifndef SLOWSJCTYPE_H #define SLOWSJCTYPE_H /* #include "sjctypenv.h" clearing the unwanted dependency (bool) -- JMR2001.05.31 // This required changing the bool typed functions to int typed functions, noted below. // This mod by Joel Matthew Rees, released under original terms of use. */ /* Test whether pointing to something I know is a single byte character. // Excluding 0x80, 0xa0, and 0xfd to 0xff because I don't know any better. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsPOneByte( char * chp ); /* Test whether pointing to something in the high byte range. // Including 0x80 and excluding 0xa0, because I don't know any better. // Also excluding 0xfd to 0xff for the same reason. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsPHighByte( char * chp ); /* Test whether pointing to something in the low byte range. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsPLowByte( char * chp ); /* We want to be able to pick out the 7 bit only without worrying about sign. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsP7bit( char * chp ); /* Try to determine the byte count by simple ranges. // Misses the gaps where no characters are defined. // Returns byte count (2, 1 or 0). */ extern int slowsjPGuessCount( char * chp ); /* As near as I can tell, all one byte, between 0 and 0x1f, inclusive. // Returns byte count (1 or 0). */ extern int slowsjIsPCntrl( char * chp ); /* Adds one two byte version of the space character. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPSpace( char * chp ); /* The standard digits 0..9, as specified in ANSI/ISO ctype. // Includes both one and two byte digits. // Does not include numerical kanji. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPDigit( char * chp ); /* The standard hexadecimal digits specified in ANSI/ISO ctype. // Includes both one and two byte digits. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPXDigit( char * chp ); /* All the JIS Roman (Latin) characters. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPRomanLower( char * chp ); extern int slowsjIsPRomanUpper( char * chp ); extern int slowsjIsPRoman( char * chp ); /* All the JIS Greek characters. // Returns byte count (2 or 0). */ extern int slowsjIsPGreekLower( char * chp ); extern int slowsjIsPGreekUpper( char * chp ); extern int slowsjIsPGreek( char * chp ); /* All the JIS Russian characters. // Returns byte count (2 or 0). */ extern int slowsjIsPRussianLower( char * chp ); extern int slowsjIsPRussianUpper( char * chp ); extern int slowsjIsPRussian( char * chp ); /* Lower cased characters, includes 8 bit JIS-Latin, // 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji, of course. // Returns byte count (2, 1, or 0 ). */ extern int slowsjIsPLower( char * chp ); /* Upper cased characters, includes 8 bit JIS-Latin, // 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji, of course. // Returns byte count (2, 1, or 0 ). */ extern int slowsjIsPUpper( char * chp ); /* 8 bit JIS-Latin, 16 bit JIS-Roman, -Greek, and -Russian characters, // but no kanji, of course. // Returns byte count (2, 1, or 0 ). */ extern int slowsjIsPEurAsianAlpha( char * chp ); /* Accent characters for JIS-Roman, -Greek, and -Russian characters, and hyphen. // Includes tilde, and 16-bit wavy dash because it looks like tilde. // Also includes 8-bit caret because it is sometimes used as circumflex, // and 16-bit overscore because it is sometimes used as a vowel lengthener in romaji. // The one-byte characters here are usually not considered part of identifiers. // Whether the two byte characters should be is left to the user. // Returns byte count (2, 1, or 0 ). */ int slowsjIsPQuasiEurAsianAlpha( char * chp ); /* Kind of wanted to split out the modified kana with an ismodified(), // but not this time around. */ /* All the hiragana, including modified. No modifiers. // Returns byte count (2 or 0 ). */ extern int slowsjIsPHiragana( char * chp ); /* All the katakana, including two byte modified and SJIS one byte. No modifiers. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPKatakana( char * chp ); /* All the kana, including two byte modified and SJIS one byte. No modifiers. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPKana( char * chp ); /* The kana word modifiers. // These are not kana, but are often used in words. // A computer language that distinguishes between upper and lower case identifiers // would also distinguish between katakana and hiragana, // and would accept all of these as valid identifier characters // and distinguish identifiers with these from the combined and un-abbreviated forms. // Cho-on (naga-oto?) is one that is really hard to say is not kana, // but it is sometimes used as a dash, too. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPQuasiKana( char * chp ); /* All the proper kanji characters. // Returns byte count (2 or 0). // (Some proprietary SJIS have one-byte Kanji.) */ extern int slowsjIsPKanji( char * chp ); /* The Kanji word modifiers. // These are not Kanji, but are often used in words. // A computer language that distinguishes between upper and lower case identifiers // would probably accept all of these as valid identifier characters // and distinguish identifiers with these from the un-abbreviated forms. // Returns byte count (2 or 0). */ extern int slowsjIsPQuasiKanji( char * chp ); /* Characters used to form words, as used by non-programmers. // Does not include the standard decimal digits, // but does include the kanji numbers. // Includes a lot of caseless characters, of course. // Returns byte count (2 or 0). */ extern int slowsjIsPAlpha( char * chp ); /* For completeness. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPQuasiAlpha( char * chp ); /* Characters used to form words, as used by programmers, thus including digits. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPAlNum( char * chp ); /* Becoming obsessive about completeness? // Okay, it was easy to drag the pieces together. So sue me. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPAlNumQuasi( char * chp ); /* Line drawing characters. // Does NOT include the IBM/MS line drawing set. // Returns byte count (2 or 0). */ extern int slowsjIsPLineDraw( char * chp ); /* All non-word-forming characters. // Will later be subdivided for the richer JIS set. Maybe. // Includes line drawing characters ("not alnum", delimiter). // Returns byte count (2, 1, or 0). */ extern int slowsjIsPPunct( char * chp ); /* All graphic non-space characters. // Plauger's explanation of the C standard says "one print position". // But it doesn't say anything about the width of that print position. // I'm going to assume that he means non-zero when not combined. // Besides, the standard he quotes only says "printing". // Returns byte count (2, 1, or 0). */ extern int slowsjIsPGraph( char * chp ); /* All graphic characters, including non-control space characters. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPPrint( char * chp ); /* Valid two byte characters only. // No two-byte control characters, so isprint() should work. // Returns false (0) or true (1). -- Comment added JMR2001.05.31 */ #define slowsjIsP2Byte( chp ) ( slowsjIsPPrint( chp ) == 2 ) /* Converts cased word forming characters to lower case, // including 8 bit ASCII/JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, // but no kanji/kana. // Overwrites the character(s) at the destination pointer, chpout. // Returns byte count converted (2, 1, or 0). */ extern int slowsjPToLowerRoman( char * chpin, char * chpout ); extern int slowsjPToLowerGreek( char * chpin, char * chpout ); extern int slowsjPToLowerRussian( char * chpin, char * chpout ); extern int slowsjPToLower( char * chpin, char * chpout ); extern int slowsjPToUpperRoman( char * chpin, char * chpout ); extern int slowsjPToUpperGreek( char * chpin, char * chpout ); extern int slowsjPToUpperRussian( char * chpin, char * chpout ); extern int slowsjPToUpper( char * chpin, char * chpout ); /* Converts cased word forming characters to upper case, */ #endif /* ifndef SLOWSJCTYPE_H */
\ No newline at end of file
1+/* slowsjctype.h v00.00.00.jmr // Near-ctype functions for shift-JIS characters, slow version. // Written by Joel Matthew Rees, Amagasaki, Hyogo, Japan, April 2001. // joel_rees@sannet.ne.jp // Shifting strategy for usability in current C environments: // pass char pointers instead of unsigned char. // Also, adding P to names to emphasize pointer usage. // // Copyright 2000, 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ #ifndef SLOWSJCTYPE_H #define SLOWSJCTYPE_H /* #include "sjctypenv.h" clearing the unwanted dependency (bool) -- JMR2001.05.31 // This required changing the bool typed functions to int typed functions, noted below. // This mod by Joel Matthew Rees, released under original terms of use. */ /* Test whether pointing to something I know is a single byte character. // Excluding 0x80, 0xa0, and 0xfd to 0xff because I don't know any better. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsPOneByte( char * chp ); /* Test whether pointing to something in the high byte range. // Including 0x80 and excluding 0xa0, because I don't know any better. // Also excluding 0xfd to 0xff for the same reason. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsPHighByte( char * chp ); /* Test whether pointing to something in the low byte range. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsPLowByte( char * chp ); /* We want to be able to pick out the 7 bit only without worrying about sign. // Returns 0 (false) or 1 (true). -- Changed from bool to int JMR2001.05.31 */ extern int slowsjIsP7bit( char * chp ); /* Try to determine the byte count by simple ranges. // Misses the gaps where no characters are defined. // Returns byte count (2, 1 or 0). */ extern int slowsjPGuessCount( char * chp ); /* As near as I can tell, all one byte, between 0 and 0x1f, inclusive. // Returns byte count (1 or 0). */ extern int slowsjIsPCntrl( char * chp ); /* Adds one two byte version of the space character. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPSpace( char * chp ); /* The standard digits 0..9, as specified in ANSI/ISO ctype. // Includes both one and two byte digits. // Does not include numerical kanji. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPDigit( char * chp ); /* The standard hexadecimal digits specified in ANSI/ISO ctype. // Includes both one and two byte digits. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPXDigit( char * chp ); /* All the JIS Roman (Latin) characters. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPRomanLower( char * chp ); extern int slowsjIsPRomanUpper( char * chp ); extern int slowsjIsPRoman( char * chp ); /* All the JIS Greek characters. // Returns byte count (2 or 0). */ extern int slowsjIsPGreekLower( char * chp ); extern int slowsjIsPGreekUpper( char * chp ); extern int slowsjIsPGreek( char * chp ); /* All the JIS Russian characters. // Returns byte count (2 or 0). */ extern int slowsjIsPRussianLower( char * chp ); extern int slowsjIsPRussianUpper( char * chp ); extern int slowsjIsPRussian( char * chp ); /* Lower cased characters, includes 8 bit JIS-Latin, // 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji, of course. // Returns byte count (2, 1, or 0 ). */ extern int slowsjIsPLower( char * chp ); /* Upper cased characters, includes 8 bit JIS-Latin, // 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji, of course. // Returns byte count (2, 1, or 0 ). */ extern int slowsjIsPUpper( char * chp ); /* 8 bit JIS-Latin, 16 bit JIS-Roman, -Greek, and -Russian characters, // but no kanji, of course. // Returns byte count (2, 1, or 0 ). */ extern int slowsjIsPEurAsianAlpha( char * chp ); /* Accent characters for JIS-Roman, -Greek, and -Russian characters, and hyphen. // Includes tilde, and 16-bit wavy dash because it looks like tilde. // Also includes 8-bit caret because it is sometimes used as circumflex, // and 16-bit overscore because it is sometimes used as a vowel lengthener in romaji. // The one-byte characters here are usually not considered part of identifiers. // Whether the two byte characters should be is left to the user. // Returns byte count (2, 1, or 0 ). */ int slowsjIsPQuasiEurAsianAlpha( char * chp ); /* Kind of wanted to split out the modified kana with an ismodified(), // but not this time around. */ /* All the hiragana, including modified. No modifiers. // Returns byte count (2 or 0 ). */ extern int slowsjIsPHiragana( char * chp ); /* All the katakana, including two byte modified and SJIS one byte. No modifiers. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPKatakana( char * chp ); /* All the kana, including two byte modified and SJIS one byte. No modifiers. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPKana( char * chp ); /* The kana word modifiers. // These are not kana, but are often used in words. // A computer language that distinguishes between upper and lower case identifiers // would also distinguish between katakana and hiragana, // and would accept all of these as valid identifier characters // and distinguish identifiers with these from the combined and un-abbreviated forms. // Cho-on (naga-oto?) is one that is really hard to say is not kana, // but it is sometimes used as a dash, too. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPQuasiKana( char * chp ); /* All the proper kanji characters. // Returns byte count (2 or 0). // (Some proprietary SJIS have one-byte Kanji.) */ extern int slowsjIsPKanji( char * chp ); /* The Kanji word modifiers. // These are not Kanji, but are often used in words. // A computer language that distinguishes between upper and lower case identifiers // would probably accept all of these as valid identifier characters // and distinguish identifiers with these from the un-abbreviated forms. // Returns byte count (2 or 0). */ extern int slowsjIsPQuasiKanji( char * chp ); /* Characters used to form words, as used by non-programmers. // Does not include the standard decimal digits, // but does include the kanji numbers. // Includes a lot of caseless characters, of course. // Returns byte count (2 or 0). */ extern int slowsjIsPAlpha( char * chp ); /* For completeness. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPQuasiAlpha( char * chp ); /* Characters used to form words, as used by programmers, thus including digits. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPAlNum( char * chp ); /* Becoming obsessive about completeness? // Okay, it was easy to drag the pieces together. So sue me. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPAlNumQuasi( char * chp ); /* Line drawing characters. // Does NOT include the IBM/MS line drawing set. // Returns byte count (2 or 0). */ extern int slowsjIsPLineDraw( char * chp ); /* All non-word-forming characters. // Will later be subdivided for the richer JIS set. Maybe. // Includes line drawing characters ("not alnum", delimiter). // Returns byte count (2, 1, or 0). */ extern int slowsjIsPPunct( char * chp ); /* All graphic non-space characters. // Plauger's explanation of the C standard says "one print position". // But it doesn't say anything about the width of that print position. // I'm going to assume that he means non-zero when not combined. // Besides, the standard he quotes only says "printing". // Returns byte count (2, 1, or 0). */ extern int slowsjIsPGraph( char * chp ); /* All graphic characters, including non-control space characters. // Returns byte count (2, 1, or 0). */ extern int slowsjIsPPrint( char * chp ); /* Valid two byte characters only. // No two-byte control characters, so isprint() should work. // Returns false (0) or true (1). -- Comment added JMR2001.05.31 */ #define slowsjIsP2Byte( chp ) ( slowsjIsPPrint( chp ) == 2 ) /* Converts cased word forming characters to lower case, // including 8 bit ASCII/JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, // but no kanji/kana. // Overwrites the character(s) at the destination pointer, chpout. // Returns byte count converted (2, 1, or 0). */ extern int slowsjPToLowerRoman( char * chpin, char * chpout ); extern int slowsjPToLowerGreek( char * chpin, char * chpout ); extern int slowsjPToLowerRussian( char * chpin, char * chpout ); extern int slowsjPToLower( char * chpin, char * chpout ); extern int slowsjPToUpperRoman( char * chpin, char * chpout ); extern int slowsjPToUpperGreek( char * chpin, char * chpout ); extern int slowsjPToUpperRussian( char * chpin, char * chpout ); extern int slowsjPToUpper( char * chpin, char * chpout ); extern int slowsjPToKatakana( char * chpin, char * chpout ); extern int slowsjPToHiragana( char * chpin, char * chpout ); /* Converts cased word forming characters to upper case, */ #endif /* ifndef SLOWSJCTYPE_H */
\ No newline at end of file