Author: | komu |
---|---|

Mode: | literate-haskell |

Date: | Tue, 27 Oct 2009 07:16:36 |

> import Control.Arrow ((&&&)) > import Data.List (group, sort, insert) > import qualified Data.Map as M > import Test.QuickCheck (quickCheck, (==>), Property) We could represent bit strings by a special type, but for simplicity we'll simply assume that they consist of a list of zeros and ones. > type BitString = [Int] The frequency-dictionary for a type is simply list of pairs where each object is associated with a frequency: > type Freqs a = [(a,Int)] The tree used for encoding is a binary tree with frequencies of individual elements of our alphabet in the leafs. To make matters speedier, the branches also contain a pre-calculated sum of leafs in that subtree. > data FreqTree a = Leaf a Int > | Branch (FreqTree a) (FreqTree a) Int > deriving (Eq, Show) > frequency :: FreqTree a -> Int > frequency (Leaf _ f) = f > frequency (Branch _ _ f) = f We can provide a partial ordering for trees by comparing their frequencies: > instance Eq a => Ord (FreqTree a) where > compare l r = compare (frequency l) (frequency r) To merge trees, we simply create a new branch that has the combined frequency of the subtrees. > merge :: FreqTree a -> FreqTree a -> FreqTree a > merge l r = Branch l r (frequency l + frequency r) To construct the encoding tree, we need to find out the frequencies of objects in the input data: > frequencies :: Ord a => [a] -> Freqs a > frequencies = map (head &&& length) . group . sort Building the tree is a simple process. First we'll create a list of leaf nodes and sort it so that the most rare elements (trees with lowest frequency) are the first. We'll continue merging the first trees of the list to new trees and push the new trees onto their proper place in the list until we are left with one node. This is our final tree. Finally, we normalize the tree to handle degenerate cases where the whole tree contains only one leaf node: we'll put that leaf node under a branch so that encoding and decoding can assume that all trees contain at least one branch. > buildTree :: Ord a => Freqs a -> FreqTree a > buildTree = normalize . build . sort . map (uncurry Leaf) > where > build [] = error "can't build tree without input" > build [t] = t > build (t1:t2:ts) = build (insert (merge t1 t2) ts) > normalize t@(Branch _ _ _) = t > normalize t@(Leaf _ _) = merge t t The Huffman codes of leafs are their paths from root to elements. We can create these simply by recursing down the tree and adding 0 to path for every left and 1 for every right we take down the recursion. > createDictionary :: Ord a => FreqTree a -> M.Map a BitString > createDictionary = M.fromList . paths [] > where > paths p (Leaf a _) = [(a,reverse p)] > paths p (Branch l r _) = paths (0:p) l ++ paths (1:p) r Now that we have a way to build a dictionary, we can simply look up the encodings for different elements from the dictionary. > encode :: Ord a => Freqs a -> [a] -> BitString > encode d s = concatMap encoder s > where > encoder a = maybe (error "unknown input") id (M.lookup a dict) > dict = createDictionary (buildTree d) When decoding, we can just consume bits from the input until we end up in a leaf. Whenever we end up in the leaf, we have a new output elements and start consuming bits from the root again. > decode :: Ord a => Freqs a -> BitString -> [a] > decode d s = loop s tree > where > loop xs (Leaf a _) = a : loop xs tree > loop (0:xs) (Branch l _ _) = loop xs l > loop (1:xs) (Branch _ r _) = loop xs r > loop [] _ = [] > loop _ _ = error "malformed bit-string" > tree = buildTree d Finally, we can check with QuickCheck that decoding is inverse of encoding. > prop_decoding_is_inverse_for_decoding :: [Int] -> Property > prop_decoding_is_inverse_for_decoding xs = > not (null xs) ==> decode fs (encode fs xs) == xs > where fs = frequencies xs > test :: IO () > test = quickCheck prop_decoding_is_inverse_for_decoding

Summary: | |
---|---|

Author: | |

Mode: | |

Body: | |