A Universe of Sorts

§ Z algorithm

The Z algorithm, for a given string $s$ , computes a function $Z: [len(s)] \rightarrow [len(s)]$ .
$Z[i]$ is the length of the longest common prefix between $S$ and $S[i:]$ .
So, $S[0] = S[i]$ , $S[1] = S[i+1]$ , $S[2] = S[i+2]$ , and so on till $S[Z[i]] = S[i + Z[i]]$ , and then $S[Z[i]+1] \neq S[i + Z[i] + 1]$ .

If we can compute the Z function for a string, we can then check if pattern Pis a substring of text T by constructing the string P#T$. Then, if we have an index such that Z[i] = len(P), we know that at that index, we have the string P as a substring.

Note that the Z-algorithm computes the Z function in linear time .

The key idea of the Z algorithm is to see that if we are at an index i, but we have an index l < isuch that i < l + z[l], then we have that s[0:z[l]] = s[l:l+z[l]]. Thus, we are "in the shade" of the l.

In this situation, we can reuse z[i-l] as a seed for z[i]. There are two cases: i + z[i-l] > l + z[l] and the converse.
If i + z[i-l] < l + z[l], then we are still "in the shade" of l, so we can safely set z[i] = z[i-l].
If not, we set z[i] = l + z[l], since we know that we match at least this much of the beginning of the string.

§ Specification

vector<int> calcz(std::string s) {
 const int n = s.size();
 vector<int> z(n);
 z[0] = 0;
 for(int i = 1; i < s.size(); ++i) {
   z[i] = 0;
   while(i + z[i] < n && s[i+z[i]] == s[z[i]]) {
     z[i]++;
   }
 }

 return z;
}

§ Implementation

vector<int> myz(std::string s) {
    const int n = s.size();
    vector<int> z(n);
    for(int i = 0; i < n; ++i) { z[i] = 0; }

    // shade that was last computed.
    int l = 0;
    for(int i = 1; i < n; ++i) {
        // shade: (l + z[l]) - i
        // guess from start: z[i-l]
        z[i] = max(0, min(l + z[l] - i, z[i-l]));

        // compare with initial portion of string.
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) { z[i]++; }

        // we exceed the current shade. Begin ruling.
        if (i + z[i] >= l + z[l]) { l = i; }
    }

    return z;
}

Reference: Algorithms on strings, trees, and sequences.