ยง Edit distance

This implementation of edit distance crystallizes the fact that when computing edit distance, we only ever move forwards on solving the problem. we do not store the results of the overlapping computations, though we could. Rather, the goal of this implementation is to capture the traversal pattern necessary for edit distance into a Cursor, and to view the edit distance problem from the point of view of this Cursor. The problem setting is that we have a source string, a destination string, and we wish to perform operations that convert the source string into the destination string. The operations allowed are to:
  • Insert a character from the destination to the source.
  • Remove a character from the source.
  • Replace a character in the source with a character from the destination.
We want to minimise the number of operations to be used. Let's see how we model this.
{-# LANGUAGE ViewPatterns #-}
type Ix = Int; type DestIx = Ix; type SrcIx = Ix;
data Move = InsertFromDest DestIx |
            RemoveFromSrc SrcIx |
            ReplaceSrcWithDest SrcIx DestIx
        deriving(Show)
Our cost model says that each move costs 1. We are charged for every move we make. We are to minimize the number of operations.
movescost :: [Move] -> Int; movescost = length
We model this as us having a Cursor which contains list [a] and information about where we are in the list as an Ix. This is the same as a Zipper for a list, except that in this case, we only allow ourselves to walk forward.
data Cursor a = Cursor Ix [a]
  • cdone tells is if we have completely consumed a cursor.
  • cix tells us the index of the cursor.
  • cval lets us dereference a cursor.
  • incr lets us move a cursor to the next array entry.
  • cursor converts a list into a Cursor at the first index.
cdone :: Cursor a -> Bool; cdone (Cursor ix vs) = ix >= length vs
cix :: Cursor a -> Ix; cix (Cursor ix _) = ix
cval :: Cursor a -> a; cval c@(Cursor ix vs) = vs !! ix
incr :: Cursor a -> Cursor a; incr (Cursor ix vs) = Cursor (ix+1) vs
cursor :: [a] -> Cursor a; cursor = Cursor 0
We implement edit, that tells us how to edit the source string into the destination string. The convention is edit .
-- | decide how to get ixth character of bs from our as.
edit :: Eq a => Cursor a -> Cursor a -> [Move]
  • 1. If both strings have been consumed, then no moves are to be made.
edit (cdone -> True) (cdone -> True) = []
  • 2. If the destination string has been fully matched while the source string has not, then remove characters from the source string.
edit a@(cdone -> False) b@(cdone -> True) =
  (RemoveFromSrc (cix a)):edit (incr a) b
  • 3. If the source string has run out of characters while the destination string still has characters, insert characters from the destination string.
edit a@(cdone -> True) b@(cdone -> False) =
  (InsertFromDest (cix b)):edit a (incr b)
  • 4. Otherwise, we have characters remaining in both strings. Try the options of (1) replacing a source character with a destination character (2) removing a character from the source and continuing, and (3) if the current characters match, then keep the match and try to combine characters that come later in the string. We pick the best out of these using the argmin combinator.
edit a b =
  let nomatch = argmin movescost
                (ReplaceSrcWithDest (cix a) (cix b):edit (incr a) (incr b))
                (RemoveFromSrc (cix a):edit (incr a) b)
  in case cval a == cval b of
      True -> argmin movescost nomatch (edit (incr a) (incr b))
      False -> nomatch
The helper used for finding minimum according to a cost model.
argmin :: (a -> Int) -> a -> a -> a
argmin f a a' = if (f a) < (f a') then a else a'