Paste: possible-phrases and friends
Author: | randy7 |
Mode: | factor |
Date: | Mon, 22 Dec 2008 14:48:51 |
Plain Text |
USING: fry locals splitting unicode.categories fry grouping sets unicode.case ;
IN: phrases
: (number-of-branches) ( seq -- seq )
length >array [ 1+ ] map ;
: just-alpha ( seq -- seq ) " " split [ [ alpha? ] filter ] map ;
: sentence-possible-phrases ( sentence -- seq )
just-alpha harvest
[ (number-of-branches) ] keep
'[ _ swap clump [ " " join ] map ] map ;
: possible-phrases ( paragraph -- repeated )
all-sentences [ >lower sentence-possible-phrases ] map
concat concat
duplicates harvest
prune >array
[ [ length ] bi@ <=> ] sort reverse ;
: old-possible-phrases ( sentence -- repeated )
sentence-possible-phrases
[ duplicates ] map harvest
[ prune >array ] map ;
: acronyms ( -- seq )
{ "Mr." "Ms." "Mrs." "Dr." "N.B.A" "F.B.I" } ;
: acronym? ( word -- ? )
acronyms member? ;
: final-puncts ( -- seq )
{ "?" "!" "." } ;
: cut-at-first-char ( seq char -- head rest )
dupd '[ _ member? ] find
2dup and [
drop 1+ cut
[ blank? ] trim-left
] [ 2drop f f ] if ;
: cut-at-first-chars ( seq str/char-seq -- head rest )
dup string? [ concat ] unless
cut-at-first-char ;
:: var-vector ( VAR -- )
SYMBOL: VAR V{ } clone VAR set ;
: +>var ( item var -- )
[ get swap suffix ]
[ set ] bi ;
: 2f? ( input1 input2 -- ? )
2dup and f = ;
: all-sentences ( string -- sentences )
"sentences" var-vector
V{ } "sentences" set
(all-sentences) "sentences" get ;
: (one-sentence) ( string -- str1/f str2/f )
final-puncts cut-at-first-chars ;
: (add-head) ( str -- )
"sentences" +>var ;
: (add-remaining) ( str -- )
(add-head) ;
: (all-sentences) ( string -- )
dup empty? [ drop ]
[ (one-sentence)
2f? [ 2drop (add-remaining) ] [
swap (add-head)
(all-sentences)
] if
] if ; recursive
Author: | randy7 |
Mode: | factor |
Date: | Mon, 22 Dec 2008 15:00:41 |
Plain Text |
USING: fry locals splitting unicode.categories fry grouping sets unicode.case sequences arrays math kernel strings
namespaces math.order sorting ;
IN: phrases
: (number-of-branches) ( seq -- seq )
length >array [ 1+ ] map ;
: just-alpha ( seq -- seq ) " " split [ [ alpha? ] filter ] map ;
: sentence-possible-phrases ( sentence -- seq )
just-alpha harvest
[ (number-of-branches) ] keep
'[ _ swap clump [ " " join ] map ] map ;
: old-possible-phrases ( sentence -- repeated )
sentence-possible-phrases
[ duplicates ] map harvest
[ prune >array ] map ;
: acronyms ( -- seq )
{ "Mr." "Ms." "Mrs." "Dr." "N.B.A" "F.B.I" } ;
: acronym? ( word -- ? )
acronyms member? ;
: final-puncts ( -- seq )
{ "?" "!" "." } ;
: cut-at-first-char ( seq char -- head rest )
dupd '[ _ member? ] find
2dup and [
drop 1+ cut
[ blank? ] trim-left
] [ 2drop f f ] if ;
: cut-at-first-chars ( seq str/char-seq -- head rest )
dup string? [ concat ] unless
cut-at-first-char ;
:: var-vector ( VAR -- )
SYMBOL: VAR V{ } clone VAR set ;
: +>var ( item var -- )
[ get swap suffix ]
[ set ] bi ;
: 2f? ( input1 input2 -- ? )
2dup and f = ;
: (one-sentence) ( string -- str1/f str2/f )
final-puncts cut-at-first-chars ;
: (add-head) ( str -- )
"sentences" +>var ;
: (add-remaining) ( str -- )
(add-head) ;
: (all-sentences) ( string -- )
dup empty? [ drop ]
[ (one-sentence)
2f? [ 2drop (add-remaining) ] [
swap (add-head)
(all-sentences)
] if
] if ; recursive
: all-sentences ( string -- sentences )
"sentences" var-vector
V{ } "sentences" set
(all-sentences) "sentences" get ;
: possible-phrases ( paragraph -- repeated )
all-sentences [ >lower sentence-possible-phrases ] map
concat concat
duplicates harvest
prune >array
[ [ length ] bi@ <=> ] sort reverse ;
Author: | mrjbq7 |
Mode: | factor |
Date: | Mon, 22 Dec 2008 15:04:49 |
Plain Text |
: (all-sentences) ( string -- )
"?!." split ;
Author: | randy7 |
Mode: | text |
Date: | Thu, 25 Dec 2008 11:16:34 |
Plain Text |
thanks, it's not entirely the same, because I'm gonna reuse the sentences for things other than phrase finding, and then I'd want the sentence complete with punctuation marks.
I'll also add more logic to better determine end of sentence.
New Annotation