§ Suffix Automata
- We take for granted knowledge of the Myhill nerode theorem to build the minimal automata of the set of suffixes of a string l.
- Let the alphabet be A, and let us build the suffix automata of l∈A⋆.
- Define the language of suffixes of a string l as L≡{l[i:]:i∈N}.
- By Myhill Nerode, states of the minimal DFA correspond to strings that are indistinguishable under extensions by a membership oracle of L.
- Suppose a state in the DFA corresponds to two strings b,s∈A⋆ ( b for big and s for small) such that ∣b∣≥∣s∣. So we have that b=Ls.
- Now, for all strings z such that bz∈L we also have sz∈L.
- So the string must look like follows:
----bbbbzzzzzzzzz
------sszzzzzzzzz
- This implies that s is a suffix of b!
- Strings in the same state correspond to suffixes of the largest string in the state.
- Next, we claim that a state consists of all suffixes upto some length. TODO.
- Therefore, it is helpful to imagine states as "funnels" or "triangles" or "narrowing trapeziums", which have at the top the longest string, and then shorter and shorter suffixes. The suffix link from
a
to a->link
points from the base of a
to the top of the trapezium a->link
such that a
and a->link
can be "joined" into a larger trapezium.
§ Suffix Automata must be a DAG
- A cycle in the automata implies that we have an infinite number of strings in the language, since we can traverse along the cycle as many times as we want before we reach a final state.
- Thus, the suffix automata, which accepts a finite language must be a DAG.
- This implies that we can perform dynamic programming on the DAG.
§ Suffix automata: relinking q
to qsmol
- Suppose we are inserting a character
c
. We are at a state p
which points to a state q
on transitition c
. - Now since
q
contains p:c
, we have that len(q) > len(p)
. If p:c
is the longest string of q
, then len(p) + 1 = len(q)
. Otherwise, q
has longest string a string which contains p:c
as a proper suffix, and thus len(p) + 1 < len(q)
. - Since
q
contains p:c
, the suffix link at q
must point to a state which is a proper prefix of p:c
. - If I therefore create a state with longest string
p:c
, this state p:c
has a longest string longer than q->link
. - Thus, it is proper to attach
q->link
to the newly created state.