Paste: LC53 cleaned up

Author: slava
Mode: factor
Date: Wed, 21 Oct 2009 04:11:16
Plain Text |
! Copyright (C) 2009 Hugo Aguilar.
! See for BSD license.
USING: kernel math math.functions sequences namespaces
combinators locals prettyprint io io.encodings.binary io.files
strings continuations system typed byte-arrays tools.time ;
IN: LC53

! ******
! ****** The following are for encryption.
! ******

CONSTANT: rnd-unity     4294967291      ! 2^32 - 5
CONSTANT: rnd-mult      3961633963      ! 2^32 - 333333333

: <prng> ( seed -- new-seed ) rnd-mult * rnd-unity mod ; inline
: <rnd-byte> ( seed -- new-seed r ) <prng> dup -24 shift ; inline

: crypt ( seed seq -- 'seq )
    ! this both encrypts and decrypts
    [ [ <rnd-byte> ] dip bitxor ] map nip ; inline

! ******
! ****** The following are for encrypting files.
! ******

CONSTANT: test-file "test.txt"
CONSTANT: test-seed 123456789

: load-file ( -- sample ) test-file binary file-contents ;
: show-file ( sample -- ) >string print ;

! ******
! ****** The following are for cracking the LC53 encryption.
! ****** We are assuming that the plaintext was a 7-bit-ascii text file.
! ****** This implies that the high bit of every byte is 0, which is what we are looking for.
! ******

: |word ( -- mask ) 31 2^ ; inline ! high bit of a word
: |byte ( -- mask )  7 2^ ; inline ! high bit of a byte

: how-far ( seed sample -- count|f )                           ! how far we went, or f if all the way
    [ [ <rnd-byte> ] dip bitxor |byte bitand 0 = not ] find  drop nip ; inline

SYMBOL: escape-continuation

: escape ( mask -- ) escape-continuation get continue-with ;

:: <find-seed> ( sample upper-count seed mask -- )
    mask 0 = [ ] [                                      ! if mask = 0, then no solution was found
        seed mask bitor             :> tryout           ! our tryout seed
        tryout sample how-far       :> count            ! the quality of our tryout seed
        count [                                         ! a count implies we didn't go all the way
            mask -1 shift           :> next-mask        ! the next lower bit mask
            upper-count count < [
                sample    count           seed mask bitor     next-mask   <find-seed>
                sample    upper-count     seed                next-mask   <find-seed>
            ] [
                sample    upper-count     seed                next-mask   <find-seed>
                sample    count           seed mask bitor     next-mask   <find-seed>
            ] if
        ] [                                             ! we found it! (our tryout went all the way)
            tryout escape
        ] if
    ] if ; inline recursive

TYPED:: find-seed ( sample: byte-array -- seed|f )
    [ escape-continuation set sample 0 0 |word <find-seed> f ] callcc1 ;

:: test ( -- )
    load-file :> sample
    sample show-file
    test-seed sample crypt :> sample
    nl "Sample file is encrypted and seed is zeroed; commencing cracking..." print flush
    [ sample find-seed ] time
    dup [
        [ nl "The seed is: " print  . ]
        [ nl "The file is: " print sample crypt show-file ] bi
    ] [
        nl "We did a complete search and were unable to find a valid seed." print
        nl "The file may have contained non-text characters (with the high-bit set)." print
    ] if ;


Annotation: possible improvement

Author: Hugh Aguilar
Mode: factor
Date: Mon, 15 Mar 2010 22:37:57
Plain Text |
Factor seems to have been improved in regard to integer arithmetic speed, as this version is several times faster than it used to be --- good job Slava!

I have a Forth version of this program in my novice package:

Some Common Lisp programmers also ported the program to CL and discussed it on comp.lang.lisp:
I don't know enough about Lisp to comment on their efforts.

All in all, LC53 is pretty inefficient. It uses HOW-FAR to determine whether to try a 1 or 0 first. The problem is that HOW-FAR often returns equal values for the 1 and 0 tries (usually zero), in which case <FIND-SEED> has to just arbitrarily choose. In the Factor program above, we have this line:
            upper-count count < [
If the values are equal, then we pretend that count were greater than upper-count. That is just arbitrary though. If you run the function TEST-HOW-FAR provided in the Forth version, you can see how often these values will be equal, as they are almost always tiny little numbers.

The problem with <FIND-SEED> is that it commits itself to a full recursive search when it decides to try the 1 or 0. The best improvement in the program would be that <FIND-SEED>, if it has equal values, would do a partial recursive search of each side of the tree, and use the best HOW-FAR value from each side. It would then be able to make a better-informed decision as to which side it should commit itself to doing a full recursive search on. Right now, LC53 does over 400 million visits to <FIND-SEED> to find a 32-bit key (the test case of the key being 123456789). That is way too many. With an improved criteria for deciding whether to try 1 or 0  first, it should be able to get the number of visits to <FIND-SEED> down to a few hundred. 

A program like this might be helped if it were written in a language that supported lazy evaluation. Haskell is the only such language that I know of. When <FIND-SEED> gets equal HOW-FAR values and has to do a partial recursive search, it would be nice if the results of that shallow search were saved somehow. <FIND-SEED> will decide on one path or the other, in which case it will be going over that same ground when it then does the full recursive search in that direction. This is exactly the kind of thing that lazy evaluation is for. We would pretend to have the HOW-FAR values for the entire lot of possible keys available (4.2 billion for a 32-bit key). We would only run HOW-FAR in the cases in which HOW-FAR hadn't been run already, but would otherwise just grab the value that we already calculated.

Even without lazy evaluation though, it would still be worthwhile to do the shallow searches first in order to obtain better criteria for committing ourselves to a full search. Even though we would be duplicating effort, we would be making better decisions and would be significantly reducing our overall effort. An improvement such as this would be necessary if the program were to crack a 64-bit linear-congruential encryption scheme. Right now the Factor program takes about 100 seconds to run using a 32-bit key. Extrapolating, this means that it would take about 420 billion seconds to run using a 64-bit key, which is about 12.7 millenia. With the upgrade described above however, the time should be reduced to a few minutes.

I may write this upgrade later on, but I'm busy with other programs right now. I just provided this discussion in case anybody else wants to take a stab at it. LC53 is just a toy program without any practical value, but it is an interesting exercise. Have fun!

New Annotation