§ Prefix/Border function


Function that for a string s, at index i, returns the length of the longest border of s[0..i] (inclusive). For example, consider the string s=abababcaab.

In toto, the prefix function is:
  ababcaab
  01234012

§ s[0..i] has a border of length at pr(i+1)-1



s[0..i+1] = p[0]p[1]...p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L]
            ^^^^^^^^^^^^^^^                     ^^^^^^^^^^^^^^^

If I now want to consider s[0..i], I need to drop the last letter p[L], which leaves me with:
s[0..i] = p[0]p[1]...p[L-1]p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L-1]
          ^^^^^^^^^^^^^^^^^                         ^^^^^^^^^^^^^^^^^


§ Lemma: Enumerating borders (what is a good name for this lemma?)



123456----123456
     ^         ^
     L         N


ab3456----1234ab
     ^         ^
     L         N


ab34ab----ab34ab
     ^         ^
     L         N


§ Algorithm to enumerate borders.


All borders of s[0:N] can be enumerated by:
  1. Taking the longest border of length L of s[0:N] (given by pr(N)).
  2. considering the border as being the substring s[0:L] (given by pr(L)).
  3. Taking the longest border of s[0:L]
  4. ... recurse till we hit empty string.

§ Computing pr(i+1)


  1. We know that pr(0) = 0.
  2. At pr(i+1), we know that s[0:pr(i)] = s[-pr(i):-1]

§ Border function is a fractal


Consider a general string which looks like:
abcdef--abcdef

If we think of the second longest border, say 12, we will get that the string must be:
12cdef--abcd12
     ^^  &&

But this implies that the occurrence of ef (marked with &&) and the occurrence of ab (marked with ##) must be 12, 12 respectively. This means that the string is:
12cd12--12cd12

So we started with a full string:
------------------

Then matched the left and right (very 1/3 cantor):
-------    ------
       ----

Then matched the left and right of these again and unified the leftovers, finding that it looks like:
--  --   --  --
  --       --
      ---

and so on. Isn't this so cool? Borders of a string are a fractal-like object!