## § Z algorithm

• The Z algorithm, for a given string $s$, computes a function $Z: [len(s)] \rightarrow [len(s)]$.
• $Z[i]$ is the length of the longest common prefix between $S$ and $S[i:]$.
• So, $S[0] = S[i]$, $S[1] = S[i+1]$, $S[2] = S[i+2]$, and so on till $S[Z[i]] = S[i + Z[i]]$, and then $S[Z[i]+1] \neq S[i + Z[i] + 1]$.
• If we can compute the Z function for a string, we can then check if pattern Pis a substring of text T by constructing the string P#T\$. Then, if we have an index such that Z[i] = len(P), we know that at that index, we have the string P as a substring.
• Note that the Z-algorithm computes the Z function in linear time .
• The key idea of the Z algorithm is to see that if we are at an index i, but we have an index l < isuch that i < l + z[l], then we have that s[0:z[l]] = s[l:l+z[l]]. Thus, we are "in the shade" of the l.
• In this situation, we can reuse z[i-l] as a seed for z[i]. There are two cases: i + z[i-l] > l + z[l] and the converse.
• If i + z[i-l] < l + z[l], then we are still "in the shade" of l, so we can safely set z[i] = z[i-l].
• If not, we set z[i] = l + z[l], since we know that we match at least this much of the beginning of the string.

#### § Specification

vector<int> calcz(std::string s) {
const int n = s.size();
vector<int> z(n);
z[0] = 0;
for(int i = 1; i < s.size(); ++i) {
z[i] = 0;
while(i + z[i] < n && s[i+z[i]] == s[z[i]]) {
z[i]++;
}
}

return z;
}


#### § Implementation

vector<int> myz(std::string s) {
const int n = s.size();
vector<int> z(n);
for(int i = 0; i < n; ++i) { z[i] = 0; }

// shade that was last computed.
int l = 0;
for(int i = 1; i < n; ++i) {
// shade: (l + z[l]) - i
// guess from start: z[i-l]
z[i] = max(0, min(l + z[l] - i, z[i-l]));

// compare with initial portion of string.
while (i + z[i] < n && s[z[i]] == s[i + z[i]]) { z[i]++; }

// we exceed the current shade. Begin ruling.
if (i + z[i] >= l + z[l]) { l = i; }
}

return z;
}

• Reference: Algorithms on strings, trees, and sequences.