Paste: sentences split
Author: | kobi |
Mode: | factor |
Date: | Sat, 11 Sep 2010 16:31:03 |
Plain Text |
USING: accessors assocs fry general-utils grouping kernel
locals math.order regexp sequences sorting splitting ;
IN: sentences
: abbrev-pattern
R/ ((\w\.)+\w)|((\w\.){2,}) / ;
: sort-slices-by-from
[ [ from>> ] bi@ <=> ] sort ;
: naive-sentence-split
R/ .[.?!:]+/ [ re-split ] [ all-matching-slices ] bi-curry bi append
sort-slices-by-from
2 <groups>
[ concat trim-spaces ] map
harvest ;
: abbreviations
abbrev-pattern all-matching-subseqs [ trim-spaces ] map ;
: (abbrev-replace-pairs)
dup [ { CHAR: . } "。" replace1 ] map zip ;
: reverse-pairs
[ values ] [ keys ] bi zip ;
: replace-all
[ first2 replace-subseq ] each ;
:: sentence-split
text abbreviations
(abbrev-replace-pairs) :> pairs
text pairs replace-all
naive-sentence-split
pairs reverse-pairs '[ _ replace-all ] map ;
New Annotation