Paste: part2

Author: randy7
Mode: factor
Date: Fri, 6 Mar 2009 22:06:21
Plain Text |
USING: accessors assocs combinators goog-search.search kernel
math quoting seperate-text sequences combinators.short-circuit ;
IN: goog-search.results

TUPLE: result rank url title summary ;
TUPLE: result-list offset entries ;

: <result> ( -- tuple ) result new ;

CONSTANT: begin-entry "<h3 class=r>"
CONSTANT: end-entry "</cite>"

: get-results ( text -- seq )
    begin-entry end-entry extract-all ; 

: add-rank ( result-list -- result-list' ) 
    [ 
        [ entries>> ] [ offset>> ] bi 
        [ + 1+ >>rank ] curry map-index 
    ] keep swap >>entries ;


: trim-spaces+quotes ( text -- text' ) [ { [ 32 = ] [ CHAR: " = ] } 1|| ] trim ;

: get-url ( text -- url ) ! TODO: prefer regexp url. 
    "<a href=" "class=l" extract-one first "http" "\"" extract-one first trim-spaces+quotes ;

: get-title ( text -- title )
    "<h3 class=r>" "</h3>" extract-one first strip-tags trim-spaces+quotes ;

: get-summary ( text -- summary )
    "<div class=" "<cite>" extract-one dup empty? [ drop "" ] [ first strip-tags trim-spaces+quotes ] if ;
    
: make-result ( text -- result )
    [ <result> ] dip    
    {
        [ get-url >>url ]
        [ get-title >>title ]
        [ get-summary >>summary ]
    } cleave ;
    

: raw>results ( tuple -- results-seq )
    raw-results>> get-results [ make-result ] map ;

! main:    
: results ( query-tuple -- result-list ) ! main word - just use this word on the query tuple, after search.
    [ raw>results ]
    [ parameters>> "start" swap at dup [ drop 0 ] unless ] bi
    result-list new swap >>offset swap >>entries add-rank ;    

    

New Annotation

Summary:
Author:
Mode:
Body: