! Copyright (C) 2008 Marc Fauconneau. ! See http://factorcode.org/license.txt for BSD license. USING: kernel arrays math accessors sequences math.vectors math.order sorting binary-search sets assocs ; IN: suffix-array ! this suffix array is a sorted array of { word startpos } suffixes ! query is efficient through binary searches : load ( loc -- seq ) first2 [ name>> ] dip tail-slice ; : suffix<=> ( loc loc -- <=> ) [ load ] bi@ <=> ; : suffixes ( word -- suffixes-seq ) [ name>> length ] keep [ swap 2array ] curry map ; : true-reduce ( seq quot -- seq' ) over length 0 = not [ [ unclip swap ] dip each ] [ 2drop f ] if ; : new-sa ( words -- sa ) [ suffixes ] map [ append ] true-reduce [ suffix<=> ] sort ; : find-one ( seq sa -- index ) [ load 2dup swap head? [ 2drop +eq+ ] [ <=> ] if ] with search drop ; : (fetch-more) ( index seq sa quot -- index' ? ) [ pick swap nth load swap head? ] dip swap slip ; : fetch-more ( seq sa first-index quot -- matches ) swap [ [ (fetch-more) ] 3curry ] dip swap [ dup ] [ ] produce nip ; : 3with ( obj1 obj2 obj3 ignore quot -- ignore curry ) with with with ; inline : query-sa ( seq sa -- matches ) [ 2dup find-one [ [ 1 - ] [ fetch-more ] 3with [ 1 + ] swap bi@ ] keep 1array 3array [ append ] true-reduce ] keep [ nth first ] curry map unique values ; ! "test" all-words 10 head new-sa query-sa .