§ Prefix/Border function

Function that for a string s, at index i, returns the length of the longest border of s[0..i] (inclusive). For example, consider the string s=abababcaab.
  • at i=0, we have substring s[0..0]=a which has no border (border is proper prefix/suffix). So pr(0) = 0.
  • at i=1, we have substring s[0..1]=ab which has no border, so pr(1) = 0.
  • at i=2, we have substring s[0..2]=aba which has a..a as border. pr(2) = 1
  • at i=3, we have substring s[0..3]=abab which has ab..ab as border. pr(3) = 2
  • at i=4, we have substring s[0..4]=ababa which has ab[a]ba as border (that is, the prefix is aba and the suffix is aba which overlaps). pr(4) = 3.
  • at i=5, we have substring s[0..5]=ababab which has ab[ab]ab as border (that is, the prefix is abab and the suffix is abab which overlaps). pr(5) = 4.
  • at i=6, we have substring s[0..6]=abababc which has no border. pr(6) = 0.
  • at i=7, we have substring s[0..7]=abababca which has border a..a. pr(7) = 1.
  • at i=8, we have substring s[0..8]=abababcab which has border ab..ab. pr(8) = 2.
In toto, the prefix function is:

§ s[0..i] has a border of length at pr(i+1)-1

  • That is, given the substring s[0..i], we can predict that s[0..i] will have some border (perhaps not the longest border) of length pr(i+1)-1.
  • Suppose s[0..i+1] has longest border of length L=pr(i+1) (by definition of pr). Suppose pr(i+1) >= 1. Then I can write s[0..i+1] as:
s[0..i+1] = p[0]p[1]...p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L]
            ^^^^^^^^^^^^^^^                     ^^^^^^^^^^^^^^^
If I now want to consider s[0..i], I need to drop the last letter p[L], which leaves me with:
s[0..i] = p[0]p[1]...p[L-1]p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L-1]
          ^^^^^^^^^^^^^^^^^                         ^^^^^^^^^^^^^^^^^
  • where we have a border of p[0]...p[L-1].
  • There maybe a longer border, involving other terms of s[0..i]. But we know that the border is at least pr(i) >= pr(i+1)-1.
  • Upon re-arranging, we see that pr(i+1) <= pr(i) + 1.
  • This tells us that the border can increase by at most 1 (it can drop to zero, no lower bound!). So we have: 0 <= pr(i+1) <= pr(i) + 1.
  • So if we think of borders of s[0..i+1], we know that the longest border can be of length of border pr(i) + 1. All other borders will be of length <= pr(i), so these other borders will be borders of s[0..i]!
  • Thus, to move from s[0..i] to s[0..(i+1)], we simply need to be able to find the longest border of s[0..(i+1)]. All other borders will come from s[0..i].

§ Lemma: Enumerating borders (what is a good name for this lemma?)

  • Think of the longest border 123456:
     ^         ^
     L         N
  • Now consider a shorter border ab:
     ^         ^
     L         N
  • But we must have a~1 and b~2 since it's the same string! So the border is really ab34ab, and the string is:
     ^         ^
     L         N
  • This shows that given the longest border 123456 of s[0:N] which has length L, any other border of s[0:N](such as ab) is also a border of s[0:L].
  • Generalizing, given the longest border of s[0..n] of length L, any smaller border of s[0:N]is a border of s[0:L].

§ Algorithm to enumerate borders.

All borders of s[0:N] can be enumerated by:
  1. Taking the longest border of length L of s[0:N] (given by pr(N)).
  2. considering the border as being the substring s[0:L] (given by pr(L)).
  3. Taking the longest border of s[0:L]
  4. ... recurse till we hit empty string.

§ Computing pr(i+1)

  1. We know that pr(0) = 0.
  2. At pr(i+1), we know that s[0:pr(i)] = s[-pr(i):-1]

§ Border function is a fractal

Consider a general string which looks like:
If we think of the second longest border, say 12, we will get that the string must be:
     ^^  &&
But this implies that the occurrence of ef (marked with &&) and the occurrence of ab (marked with ##) must be 12, 12 respectively. This means that the string is:
So we started with a full string:
Then matched the left and right (very 1/3 cantor):
-------    ------
Then matched the left and right of these again and unified the leftovers, finding that it looks like:
--  --   --  --
  --       -- 
and so on. Isn't this so cool? Borders of a string are a fractal-like object!