Paste: simple string similarity

Author: randy7
Mode: factor
Date: Mon, 26 Jan 2009 11:02:18
Plain Text |
! 2 minutes implementation of 
! http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/1/
! algorithm by Simon White


USING: grouping kernel math sequences sets unicode.case ;

: similarity ( string string -- n )
    [ >upper 2 clump ] bi@ [ intersect length 2 * ] 2keep
    [ length ] bi@ + / ;

! umm, btw, if I haven't mentioned it before, you can use any code that I paste here.

Annotation: sentence similarity

Author: randy7
Mode: factor
Date: Mon, 26 Jan 2009 12:04:16
Plain Text |
! different algorithm but here's an attempt 
! for sentence similarity (simple item match against each item), nothing for the seq order though which is a shame.

USING: kernel math sequences splitting ;

: (one-result) ( item seq -- n )
    swap [ = ] curry map
    [ [ ] filter length ] [ length ] bi / ;

: sentence-similarity ( sentence sentence -- n )
    [ " " split ] bi@
    '[ _ (one-result) ] map sum ;

! instead of exact matches, you can change this to use the above algorithm to find partial matches too, and then average later. but this is only on a word-per-word basis and there's nothing to score the order of words.

New Annotation

Summary:
Author:
Mode:
Body: