Paste: word-suffix-array

Author: prunedtree
Mode: factor
Date: Wed, 1 Oct 2008 09:57:29
Plain Text |
! Copyright (C) 2008 Marc Fauconneau.
! See http://factorcode.org/license.txt for BSD license.
USING: kernel arrays math accessors sequences math.vectors math.order sorting
       binary-search sets assocs ;
IN: suffix-array

! this suffix array is a sorted array of { word startpos } suffixes
! query is efficient through binary searches

: load ( loc -- seq ) first2 [ name>> ] dip tail-slice ;

: suffix<=> ( loc loc -- <=> ) [ load ] bi@ <=> ;

: suffixes ( word -- suffixes-seq ) [ name>> length ] keep [ swap 2array ] curry map ;

: true-reduce ( seq quot -- seq' )
  over length 0 = not
  [ [ unclip swap ] dip each ] 
  [ 2drop f ] if ;
    
: new-sa ( words -- sa )
  [ suffixes ] map [ append ] true-reduce 
  [ suffix<=> ] sort ;

: find-one ( seq sa -- index )
   [ load 2dup swap head? [ 2drop +eq+ ] [ <=> ] if ] with search drop ;
 

: (fetch-more) ( index seq sa quot -- index' ? )
  [ pick swap nth load swap head? ] dip swap slip ;

: fetch-more ( seq sa first-index quot -- matches )
  swap [ [ (fetch-more) ] 3curry ] dip
  swap [ dup ] [ ] produce nip ;

: 3with ( obj1 obj2 obj3 ignore quot -- ignore curry ) with with with ; inline

: query-sa ( seq sa -- matches )
    [ 2dup find-one
      [ [ 1 - ] [ fetch-more ] 3with [ 1 + ] swap bi@ ] keep
      1array 3array [ append ] true-reduce 
    ] keep [ nth first ] curry map unique values ;
    
    
    
! "test" all-words 10 head new-sa query-sa .   

New Annotation

Summary:
Author:
Mode:
Body: