§ Edit distance
This implementation of edit distance crystallizes the fact that when computing
edit distance, we only ever move forwards on solving the problem. we do not
store the results of the overlapping computations, though we could. Rather,
the goal of this implementation is to capture the traversal pattern necessary
for edit distance into a Cursor
, and to view the edit distance problem from
the point of view of this Cursor
.
The problem setting is that we have a source string, a destination
string, and we wish to perform operations that convert the source
string into the destination string. The operations allowed are to:
- Insert a character from the destination to the source.
- Remove a character from the source.
- Replace a character in the source with a character from the destination.
We want to minimise the number of operations to be used.
Let's see how we model this.
type Ix = Int; type DestIx = Ix; type SrcIx = Ix;
data Move = InsertFromDest DestIx |
RemoveFromSrc SrcIx |
ReplaceSrcWithDest SrcIx DestIx
deriving(Show)
Our cost model says that each move costs 1
. We are charged for every
move we make. We are to minimize the number of operations.
movescost :: [Move] -> Int; movescost = length
We model this as us having a Cursor
which contains list [a]
and information
about where we are in the list as an Ix
.
This is the same as a Zipper
for a list, except that in this case, we only
allow ourselves to walk forward.
data Cursor a = Cursor Ix [a]
-
cdone
tells is if we have completely consumed a cursor. -
cix
tells us the index of the cursor. -
cval
lets us dereference a cursor. -
incr
lets us move a cursor to the next array entry. -
cursor
converts a list into a Cursor
at the first index.
cdone :: Cursor a -> Bool; cdone (Cursor ix vs) = ix >= length vs
cix :: Cursor a -> Ix; cix (Cursor ix _) = ix
cval :: Cursor a -> a; cval c@(Cursor ix vs) = vs !! ix
incr :: Cursor a -> Cursor a; incr (Cursor ix vs) = Cursor (ix+1) vs
cursor :: [a] -> Cursor a; cursor = Cursor 0
We implement edit
, that tells us how to edit the source string into
the destination string. The convention is edit
.
edit :: Eq a => Cursor a -> Cursor a -> [Move]
- 1. If both strings have been consumed, then no moves are to be made.
edit (cdone -> True) (cdone -> True) = []
- 2. If the destination string has been fully matched while the source string has not, then remove characters from the source string.
edit a@(cdone -> False) b@(cdone -> True) =
(RemoveFromSrc (cix a)):edit (incr a) b
- 3. If the source string has run out of characters while the destination string still has characters, insert characters from the destination string.
edit a@(cdone -> True) b@(cdone -> False) =
(InsertFromDest (cix b)):edit a (incr b)
- 4. Otherwise, we have characters remaining in both strings. Try the options of (1) replacing a source character with a destination character (2) removing a character from the source and continuing, and (3) if the current characters match, then keep the match and try to combine characters that come later in the string. We pick the best out of these using the
argmin
combinator.
edit a b =
let nomatch = argmin movescost
(ReplaceSrcWithDest (cix a) (cix b):edit (incr a) (incr b))
(RemoveFromSrc (cix a):edit (incr a) b)
in case cval a == cval b of
True -> argmin movescost nomatch (edit (incr a) (incr b))
False -> nomatch
The helper used for finding minimum according to a cost model.
argmin :: (a -> Int) -> a -> a -> a
argmin f a a' = if (f a) < (f a') then a else a'