! Copyright (C) 2009 Caesar Hu ! See http://factorcode.org/license.txt for BSD license. USING: assocs byte-arrays combinators io io.encodings io.encodings.ascii io.encodings.iana io.files kernel locals math math.order math.parser memoize multiline sequences splitting biassocs hashtables values ascii ; IN: io.encodings.Big5 SINGLETON: Big5 ALIAS: big5 Big5 ALIAS: cp950 Big5 ALIAS: ms950 Big5 Big5 "Big5" register-encoding "resource:work/io/encodings/Big5/cp950.txt" ascii file-lines ; : (PCL-drop-comments) ( seq -- newseq ) [ "#" split1 drop ] map harvest ; : (PCL-split-column) ( line -- columns ) "\t" split 2 head ; : (PCL-parse-hex) ( s -- n ) 2 short tail hex> ; : (PCL-parse-line) ( line -- code-unicode ) (PCL-split-column) [ (PCL-parse-hex) ] map ; : process-codetable-lines ( lines -- assoc ) (PCL-drop-comments) [ (PCL-parse-line) ] map ; VALUE: mapping cp950.txt-lines process-codetable-lines >hashtable >biassoc to: mapping : linear ( c1 c2 -- mb ) [ HEX: 100 * ] dip + ; : unlinear ( mb -- c1 c2 ) HEX: 100 /mod ; : ch->bytes ( char -- bytes ) dup ascii? [ 1byte-array ] [ unlinear 2byte-array ] if ; M: Big5 encode-char ( char stream encoding -- ) drop [ mapping value-at dup [ ch->bytes ] [ encode-error ] if ] dip stream-write ; : 2bytes? ( ch -- ? ) HEX: 81 HEX: fe between? ; : 2bytes ( stream byte -- char ) swap stream-read1 linear mapping at ; M: Big5 decode-char ( stream encoding -- char ) drop dup stream-read1 { { [ dup not ] [ 2drop f ] } { [ dup ascii? ] [ nip mapping at ] } { [ dup 2bytes? ] [ 2bytes ] } [ 2drop replacement-char ] } cond ;