A Universe of Sorts

§ Prefix/Border function

Function that for a string s, at index i, returns the length of the longest border of s[0..i] (inclusive). For example, consider the string s=abababcaab.

at i=0, we have substring s[0..0]=a which has no border (border is proper prefix/suffix). So pr(0) = 0.
at i=1, we have substring s[0..1]=ab which has no border, so pr(1) = 0.
at i=2, we have substring s[0..2]=aba which has a..a as border. pr(2) = 1
at i=3, we have substring s[0..3]=abab which has ab..ab as border. pr(3) = 2
at i=4, we have substring s[0..4]=ababa which has ab[a]ba as border (that is, the prefix is aba and the suffix is aba which overlaps). pr(4) = 3.
at i=5, we have substring s[0..5]=ababab which has ab[ab]ab as border (that is, the prefix is abab and the suffix is abab which overlaps). pr(5) = 4.
at i=6, we have substring s[0..6]=abababc which has no border. pr(6) = 0.
at i=7, we have substring s[0..7]=abababca which has border a..a. pr(7) = 1.
at i=8, we have substring s[0..8]=abababcab which has border ab..ab. pr(8) = 2.

In toto, the prefix function is:

  ababcaab
  01234012

§ `s[0..i]` has a border of length at `pr(i+1)-1`

That is, given the substring s[0..i], we can predict that s[0..i] will have some border (perhaps not the longest border) of length pr(i+1)-1.
Suppose s[0..i+1] has longest border of length L=pr(i+1) (by definition of pr). Suppose pr(i+1) >= 1. Then I can write s[0..i+1] as:

s[0..i+1] = p[0]p[1]...p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L]
            ^^^^^^^^^^^^^^^                     ^^^^^^^^^^^^^^^

If I now want to consider s[0..i], I need to drop the last letter p[L], which leaves me with:

s[0..i] = p[0]p[1]...p[L-1]p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L-1]
          ^^^^^^^^^^^^^^^^^                         ^^^^^^^^^^^^^^^^^

where we have a border of p[0]...p[L-1].
There maybe a longer border, involving other terms of s[0..i]. But we know that the border is at least pr(i) >= pr(i+1)-1.
Upon re-arranging, we see that pr(i+1) <= pr(i) + 1.
This tells us that the border can increase by at most 1 (it can drop to zero, no lower bound!). So we have: 0 <= pr(i+1) <= pr(i) + 1.
So if we think of borders of s[0..i+1], we know that the longest border can be of length of border pr(i) + 1. All other borders will be of length <= pr(i), so these other borders will be borders of s[0..i]!
Thus, to move from s[0..i] to s[0..(i+1)], we simply need to be able to find the longest border of s[0..(i+1)]. All other borders will come from s[0..i].

§ Lemma: Enumerating borders (what is a good name for this lemma?)

Think of the longest border 123456:

123456----123456
     ^         ^
     L         N

Now consider a shorter border ab:

ab3456----1234ab
     ^         ^
     L         N

But we must have a~1 and b~2 since it's the same string! So the border is really ab34ab, and the string is:

ab34ab----ab34ab
     ^         ^
     L         N

This shows that given the longest border 123456 of s[0:N] which has length L, any other border of s[0:N](such as ab) is also a border of s[0:L].
Generalizing, given the longest border of s[0..n] of length L, any smaller border of s[0:N]is a border of s[0:L].

§ Algorithm to enumerate borders.

All borders of s[0:N] can be enumerated by:

Taking the longest border of length L of s[0:N] (given by pr(N)).
considering the border as being the substring s[0:L] (given by pr(L)).
Taking the longest border of s[0:L]
... recurse till we hit empty string.

§ Computing `pr(i+1)`

We know that pr(0) = 0.
At pr(i+1), we know that s[0:pr(i)] = s[-pr(i):-1]

§ Border function is a fractal

Consider a general string which looks like:

abcdef--abcdef

If we think of the second longest border, say 12, we will get that the string must be:

12cdef--abcd12
     ^^  &&

But this implies that the occurrence of ef (marked with &&) and the occurrence of ab (marked with ##) must be 12, 12 respectively. This means that the string is:

12cd12--12cd12

So we started with a full string:

------------------

Then matched the left and right (very 1/3 cantor):

-------    ------
       ----

Then matched the left and right of these again and unified the leftovers, finding that it looks like:

--  --   --  --
  --       --
      ---

and so on. Isn't this so cool? Borders of a string are a fractal-like object!