Paste: factor-finder (improved tinkering)

Author: randy7
Mode: factor
Date: Sun, 18 Jan 2009 00:09:29
Plain Text |
IN: factor-finder
USING: ui.tools.search io.encodings.utf8 regexp splitting grouping
       fry sets ;


: count-seq ( seq -- seq' ) dup '[ _ swap '[ _ = ] count ] map ;
: count-dups ( seq -- seq ) dup count-seq zip prune >array ;
: just-2+ ( seq -- seq ) [ dup second  2 < [ drop f ] when  ] map harvest ;

: sort-by-count ( seq -- seq ) [ [ second ] bi@ <=> ] sort reverse ;
: read-as-one-string ( seq -- str ) [ utf8 file-contents ] map " " join ;
: consistent-space ( str -- str ) "[\n|\r|\t|\s\s]" <regexp> " " re-replace ;
: split-to-words ( str -- seq ) " " split harvest ;
: (prepare-clumps) ( n -- seq ) [ length >array ] dip head [ 1+ ] map ;
: make-words-groups ( count-array words -- groups ) [ swap clump ] curry map ;
: words-to-phrases ( seq -- seq ) [ [ " " join ] map ] map ;
    

: run-factor-finder ( -- )
    all-source-files read-as-one-string
    consistent-space split-to-words
    [ 10 (prepare-clumps) ] keep
    make-words-groups
    words-to-phrases
    count-dups just-2+
    sort-by-count ;

Annotation: out-of-memory

Author: randy7
Mode: factor
Date: Sun, 18 Jan 2009 00:10:43
Plain Text |
note, that I can't get a result on my computer as it runs out of memory.
anyone wanna tell me if it worked well?
or preferably a way to make it more efficient or take less memory
Thanks!

Annotation: fix

Author: randy7
Mode: factor
Date: Sun, 18 Jan 2009 00:26:51
Plain Text |
: count-dups ( seq -- seq ) [ dup count-seq zip prune >array ] map ;

New Annotation

Summary:
Author:
Mode:
Body: