Paste: Taiwan Big5 encoding

Author: Caesar Hu
Mode: factor
Date: Mon, 2 Mar 2009 05:30:46
Plain Text |
! Copyright (C) 2009 Caesar Hu
! See http://factorcode.org/license.txt for BSD license.
USING: assocs byte-arrays combinators io io.encodings
       io.encodings.ascii io.encodings.iana io.files kernel locals math
       math.order math.parser memoize multiline sequences splitting 
       biassocs hashtables values ascii ;
IN: io.encodings.Big5

SINGLETON: Big5

ALIAS: big5 Big5
ALIAS: cp950 Big5
ALIAS: ms950 Big5

Big5 "Big5" register-encoding

<PRIVATE

: cp950.txt-lines ( -- seq )
    ! "cp950.txt" from ...
    ! <http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT>
    "resource:work/io/encodings/Big5/cp950.txt"
    ascii file-lines ;

: (PCL-drop-comments) ( seq -- newseq )
    [ "#" split1 drop ] map harvest ;
 
: (PCL-split-column) ( line -- columns )
    "\t" split 2 head ;

: (PCL-parse-hex) ( s -- n )
    2 short tail hex> ;
 
: (PCL-parse-line) ( line -- code-unicode )
    (PCL-split-column)
    [ (PCL-parse-hex) ] map ;

: process-codetable-lines ( lines -- assoc )
    (PCL-drop-comments)
    [ (PCL-parse-line) ] map ;

VALUE: mapping

cp950.txt-lines process-codetable-lines >hashtable >biassoc to: mapping

: linear ( c1 c2 -- mb )
    [ HEX: 100 * ] dip + ;

: unlinear ( mb -- c1 c2 )
    HEX: 100 /mod ;

: ch->bytes ( char -- bytes )
    dup ascii? [ 1byte-array ] [ unlinear 2byte-array ] if ;

M: Big5 encode-char ( char stream encoding -- )
    drop [
        mapping value-at dup
        [ ch->bytes ]
        [ encode-error ]
        if
    ] dip stream-write ;

: 2bytes? ( ch -- ? ) HEX: 81 HEX: fe between? ;

: 2bytes ( stream byte -- char )
    swap stream-read1 linear mapping at ;

M: Big5 decode-char ( stream encoding -- char )
    drop dup stream-read1 {
        { [ dup not ] [ 2drop f ] }
        { [ dup ascii? ] [ nip mapping at ] }
        { [ dup 2bytes? ] [ 2bytes ] }
        [ 2drop replacement-char ]
    } cond ;

New Annotation

Summary:
Author:
Mode:
Body: