Paste: IN: http-analyze

Author: kobi
Mode: factor
Date: Sat, 4 Sep 2010 12:59:13
Plain Text |
USING: accessors concurrency.combinators concurrency.semaphores
fry general-utils http-part kernel sequences unicode.case ;
QUALIFIED: sets
IN: http-analyze


: valid-pages ( urls -- urls' )
    [ valid-page? ] filter ;
    
: http-get-all ( urls #parallel -- htmls )
    <semaphore> [ [ http-safe-get nip ] with-semaphore ] curry parallel-map sift ;
    
: unique-emails ( emails -- emails' )
    [ >lower ] map sets:members ;
    
: pages-non-binary ( urls -- urls' )
    8 <semaphore> '[ _ [ http-binary? not ] with-semaphore ] parallel-filter ;

: pages-non-secured ( urls -- urls' )
    [ protocol>> "http" = ] filter ;
    
: retain-absolute-urls ( urls -- urls' )
    [ host>> ] filter ;

: safe-urls ( urls -- urls' )    
    retain-absolute-urls pages-non-secured valid-pages pages-non-binary ;

: urls>emails ( urls -- emails )
    safe-urls 8 http-get-all [ extract-emails2 ] map concat unique-emails ;
    

New Annotation

Summary:
Author:
Mode:
Body: