Paste: IN: http-analyze
Author: | kobi |
Mode: | factor |
Date: | Sat, 4 Sep 2010 12:59:13 |
Plain Text |
USING: accessors concurrency.combinators concurrency.semaphores
fry general-utils http-part kernel sequences unicode.case ;
QUALIFIED: sets
IN: http-analyze
: valid-pages ( urls -- urls' )
[ valid-page? ] filter ;
: http-get-all ( urls #parallel -- htmls )
<semaphore> [ [ http-safe-get nip ] with-semaphore ] curry parallel-map sift ;
: unique-emails ( emails -- emails' )
[ >lower ] map sets:members ;
: pages-non-binary ( urls -- urls' )
8 <semaphore> '[ _ [ http-binary? not ] with-semaphore ] parallel-filter ;
: pages-non-secured ( urls -- urls' )
[ protocol>> "http" = ] filter ;
: retain-absolute-urls ( urls -- urls' )
[ host>> ] filter ;
: safe-urls ( urls -- urls' )
retain-absolute-urls pages-non-secured valid-pages pages-non-binary ;
: urls>emails ( urls -- emails )
safe-urls 8 http-get-all [ extract-emails2 ] map concat unique-emails ;
New Annotation