Paste: Taiwan Big5 encoding
Author: | Caesar Hu |
Mode: | factor |
Date: | Mon, 2 Mar 2009 05:30:46 |
Plain Text |
USING: assocs byte-arrays combinators io io.encodings
io.encodings.ascii io.encodings.iana io.files kernel locals math
math.order math.parser memoize multiline sequences splitting
biassocs hashtables values ascii ;
IN: io.encodings.Big5
SINGLETON: Big5
ALIAS: big5 Big5
ALIAS: cp950 Big5
ALIAS: ms950 Big5
Big5 "Big5" register-encoding
<PRIVATE
: cp950.txt-lines ( -- seq )
"resource:work/io/encodings/Big5/cp950.txt"
ascii file-lines ;
: (PCL-drop-comments) ( seq -- newseq )
[ "#" split1 drop ] map harvest ;
: (PCL-split-column) ( line -- columns )
"\t" split 2 head ;
: (PCL-parse-hex) ( s -- n )
2 short tail hex> ;
: (PCL-parse-line) ( line -- code-unicode )
(PCL-split-column)
[ (PCL-parse-hex) ] map ;
: process-codetable-lines ( lines -- assoc )
(PCL-drop-comments)
[ (PCL-parse-line) ] map ;
VALUE: mapping
cp950.txt-lines process-codetable-lines >hashtable >biassoc to: mapping
: linear ( c1 c2 -- mb )
[ HEX: 100 * ] dip + ;
: unlinear ( mb -- c1 c2 )
HEX: 100 /mod ;
: ch->bytes ( char -- bytes )
dup ascii? [ 1byte-array ] [ unlinear 2byte-array ] if ;
M: Big5 encode-char ( char stream encoding -- )
drop [
mapping value-at dup
[ ch->bytes ]
[ encode-error ]
if
] dip stream-write ;
: 2bytes? ( ch -- ? ) HEX: 81 HEX: fe between? ;
: 2bytes ( stream byte -- char )
swap stream-read1 linear mapping at ;
M: Big5 decode-char ( stream encoding -- char )
drop dup stream-read1 {
{ [ dup not ] [ 2drop f ] }
{ [ dup ascii? ] [ nip mapping at ] }
{ [ dup 2bytes? ] [ 2bytes ] }
[ 2drop replacement-char ]
} cond ;
New Annotation