Paste: word-suffix-array 3

Author: prunedtree
Mode: factor
Date: Wed, 1 Oct 2008 10:53:19
Plain Text |
! Copyright (C) 2008 Marc Fauconneau.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel arrays math accessors sequences math.vectors math.order sorting
       binary-search sets assocs fry ;
IN: suffix-array

! this suffix array is a sorted array of { word startpos } suffixes
! query is efficient through binary searches

: load ( loc -- seq ) first2 [ name>> ] dip tail-slice ;

: suffix<=> ( loc loc -- <=> ) [ load ] bi@ <=> ;

: suffixes ( word -- suffixes-seq ) dup name>> length [ 2array ] with map ;
    
: new-sa ( words -- sa )  [ suffixes ] map concat [ suffix<=> ] sort ;

: may-compare ( ? <=> -- <=> ) swap [ drop +eq+ ] [ ] if
 
: find-one ( sa seq -- index )
    '[ load  _ [ head? ] [ swap <=> ] 2bi may-compare ] search drop ;
   
: (fetch-more) ( index sa seq quot -- index' ? )
    [ [ dupd nth load ] dip head? ] dip swap slip ;

: fetch-more ( sa seq first-index quot -- matches )
    swap '[ [ _ _ _ (fetch-more) ] ] dip swap [ dup ] [ ] produce nip ;

: query-sa ( seq sa -- matches )
    [ 
        swap 2dup find-one dup
        '[ [ 1 - ] [ 1 + ] [ [ _ _ _ ] dip fetch-more ] bi@ _ ] call
        1array 3array concat
    ] keep '[ _ nth first ] map prune ;
    
        
! "test" all-words 10 head new-sa query-sa .    

New Annotation

Summary:
Author:
Mode:
Body: